ABFS client hangs if incorrect client ID or wrong path used
Problem
You are using Azure Data Lake Storage (ADLS) Gen2. When you try to access an Azure Blob File System (ABFS) path from a Databricks cluster, the command hangs.
Enable the debug log and you can see the following stack trace in the driver logs:
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/b9b831a9-6c10-40bf-86f3-489ed83c81e8/oauth2/token
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:91)
at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1484)
at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1482)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:782)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1481)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenSingleCall(AzureADAuthenticator.java:254)
... 31 more
Cause
If ABFS is configured on a cluster with a wrong value for property fs.azure.account.oauth2.client.id
, or if you try to access an explicit path of the form abfss://myContainer@myStorageAccount.dfs.core.windows.net/...
where myStorageAccount
does not exist, then the ABFS driver ends up in a retry loop and becomes unresponsive. The command will eventually fail, but because it retries so many times, it appears to be a hung command.
If you try to access an incorrect path with an existing storage account, you will see a 404 error message. The system does not hang in this case.