Reading a table fails due to AAD token timeout on ADLS Gen2

Accessing ADLS Gen2 storage fails if the AAD service principal token is expired or invalid.

Written by John.Lourdu

Last published at: November 30th, 2022


Problem

Access to ADLS Gen2 storage can be configured using OAuth 2.0 with an Azure service principal. You can securely access data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure AD) application service principal for authentication.

You are trying to access external tables (tables stored outside of the root storage location) which are stored on ADLS Gen2. Access fails with an ADLException error and an IOException : AADToken timeout error.

WARN DeltaLog: Failed to parse dbfs:/mnt/<table path in ADLS Gen2 storage container>. This may happen if there was an error during read operation, or a file appears to be partial. Sleeping and trying again.
com.microsoft.azure.datalake.store.ADLException: Error getting info for file <table path in ADLS Gen2 storage container>
Error fetching access tokenOperation null failed with exception java.io.IOException : AADToken: HTTP connection failed for getting token from AzureAD due to timeout. Client Request Id :<directory-id> Latency(ns) : 180152012
Last encountered exception thrown after 5 tries. [java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException]
[ServerRequestId:null]
Caused by: java.io.IOException: Server returned HTTP response code: 401 for URL: https://login.microsoftonline.com/&lt;directory-id>/oauth2/token  
at sun.reflect.GeneratedConstructorAccessor118.newInstance(Unknown Source)  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Caused by: java.io.IOException: Server returned HTTP response code: 401 for URL: https://login.microsoftonline.com/&lt;directory-id>/oauth2/token
  at sun.reflect.GeneratedConstructorAccessor118.newInstance(Unknown Source)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

Cause

Access to ADLS Gen2 storage fails if the client secret token associated with the Azure Active Directory (Azure AD) application service principal is expired or invalid.

Solution

Review the storage account access setup and verify that the client secret is expired. Create a new client secret token and then remount the ADLS Gen2 storage container using the new secret, or update the client secret token with the new secret in the ADLS Gen2 storage account configuration.

Review existing storage account secrets

Check to see if the existing client secret is expired.

  1. Open the Azure portal.
  2. Click Azure Active Directory.
  3. In the menu on the left, look under Manage and click App registrations.
  4. On the all applications tab, locate the application created for Azure Databricks. You can search the app registrations by Display name or by Application (client) ID.
  5. Click on your application.                                             
  6. In the menu on the left, look under Manage and click Certificates & secrets.
  7. Review the Client secrets section and check the date in the Expires column.

Create a new secret token

If the existing client secret is expired, you must create a new token.

  1. Click New client secret.
  2. Enter a description and a duration for the secret.
  3. Click Add.
  4. The client secret is displayed. Copy the Value. It cannot be retrieved after you leave the page.
Delete

Warning

If you forget to copy the secret Value, you must repeat these steps. The Value cannot be retrieved once you leave the page. Returning to the page displays a masked version of the Value.

Remount ADLS Gen2 storage with new secret

Once you have generated a new client secret, you can unmount the existing ADLS Gen2 storage, update the secret information, and then remount the storage.

  1. Unmount the existing mount point.

    %python
    
    dbutils.fs.unmount("/mnt/<mount-name>")

    Review the dbutils.fs.unmount documentation for more information.

  2. Remount the storage account with the new client secret.

  3. Replace

    • <application-id> with the Application (client) ID for the Azure Active Directory application
    • <container-name> with the name of the container
    • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application
    • <scope-name> with the Databricks secret scope name
    • <service-credential-key> with the name of the key containing the client secret
    • <storage-account> with the name of the Azure storage account


    %python
    
    configs = {"fs.azure.account.auth.type": "OAuth",
              "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
              "fs.azure.account.oauth2.client.id": "<application-id>",
              "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key>"),
              "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
    
    # Optionally, you can add <directory-path> to the source URI of your mount point.
    dbutils.fs.mount(
      source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/<directory-path>",
      mount_point = "/mnt/<mount-name>",
      extra_configs = configs)

    Review the mount an Azure Blob storage container documentation for more information.

Replace client secret in the storage account config

As an alternative to updating individual mounts, you can replace the client secret in the storage account authentication configuration. The storage account must be set up for direct access. 

%python

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", "<service_credential_key_name>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Replace

  • <application-id> with the Application (client) ID for the Azure Active Directory application
  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application
  • <service-credential-key> with the name of the key containing the client secret
  • <storage-account> with the name of the Azure storage account

Review the access ADLS Gen2 documentation for more information.

Was this article helpful?