Problem
Access to ADLS Gen2 storage can be configured using OAuth 2.0 with an Azure service principal. You can securely access data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure AD) application service principal for authentication.
You are trying to access external tables (tables stored outside of the root storage location) which are stored on ADLS Gen2. Access fails with an ADLException error and an IOException : AADToken timeout error.
WARN DeltaLog: Failed to parse dbfs:/mnt/<table path in ADLS Gen2 storage container>. This may happen if there was an error during read operation, or a file appears to be partial. Sleeping and trying again. com.microsoft.azure.datalake.store.ADLException: Error getting info for file <table path in ADLS Gen2 storage container> Error fetching access tokenOperation null failed with exception java.io.IOException : AADToken: HTTP connection failed for getting token from AzureAD due to timeout. Client Request Id :<directory-id> Latency(ns) : 180152012 Last encountered exception thrown after 5 tries. [java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException] [ServerRequestId:null] Caused by: java.io.IOException: Server returned HTTP response code: 401 for URL: https://login.microsoftonline.com/<directory-id>/oauth2/token at sun.reflect.GeneratedConstructorAccessor118.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) Caused by: java.io.IOException: Server returned HTTP response code: 401 for URL: https://login.microsoftonline.com/<directory-id>/oauth2/token at sun.reflect.GeneratedConstructorAccessor118.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Cause
Access to ADLS Gen2 storage fails if the client secret token associated with the Azure Active Directory (Azure AD) application service principal is expired or invalid.
Solution
Review the storage account access setup and verify that the client secret is expired. Create a new client secret token and then remount the ADLS Gen2 storage container using the new secret, or update the client secret token with the new secret in the ADLS Gen2 storage account configuration.
Review existing storage account secrets
Check to see if the existing client secret is expired.
- Open the Azure portal.
- Click Azure Active Directory.
- In the menu on the left, look under Manage and click App registrations.
- On the all applications tab, locate the application created for Azure Databricks. You can search the app registrations by Display name or by Application (client) ID.
- Click on your application.
- In the menu on the left, look under Manage and click Certificates & secrets.
- Review the Client secrets section and check the date in the Expires column.
Create a new secret token
If the existing client secret is expired, you must create a new token.
- Click New client secret.
- Enter a description and a duration for the secret.
- Click Add.
- The client secret is displayed. Copy the Value. It cannot be retrieved after you leave the page.
Remount ADLS Gen2 storage with new secret
Once you have generated a new client secret, you can unmount the existing ADLS Gen2 storage, update the secret information, and then remount the storage.
-
Unmount the existing mount point.
%python dbutils.fs.unmount("/mnt/<mount-name>")
Review the dbutils.fs.unmount documentation for more information.
Remount the storage account with the new client secret.
-
Replace
- <application-id> with the Application (client) ID for the Azure Active Directory application
- <container-name> with the name of the container
- <directory-id> with the Directory (tenant) ID for the Azure Active Directory application
- <scope-name> with the Databricks secret scope name
- <service-credential-key> with the name of the key containing the client secret
- <storage-account> with the name of the Azure storage account
%python configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azure.account.oauth2.client.id": "<application-id>", "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key>"), "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"} # Optionally, you can add <directory-path> to the source URI of your mount point. dbutils.fs.mount( source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/<directory-path>", mount_point = "/mnt/<mount-name>", extra_configs = configs)
Review the mount an Azure Blob storage container documentation for more information.
Replace client secret in the storage account config
As an alternative to updating individual mounts, you can replace the client secret in the storage account authentication configuration. The storage account must be set up for direct access.
%python spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth") spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>") spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", "<service_credential_key_name>") spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
Replace
- <application-id> with the Application (client) ID for the Azure Active Directory application
- <directory-id> with the Directory (tenant) ID for the Azure Active Directory application
- <service-credential-key> with the name of the key containing the client secret
- <storage-account> with the name of the Azure storage account
Review the access ADLS Gen2 documentation for more information.