Problem
You are using Azure Data Lake Storage (ADLS) Gen2. When you try to access an Azure Blob File System (ABFS) path from a Databricks cluster, the command hangs.
Enable the debug log and you can see the following stack trace in the driver logs:
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/b9b831a9-6c10-40bf-86f3-489ed83c81e8/oauth2/token at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894) at sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:91) at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1484) at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1482) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:782) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1481) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenSingleCall(AzureADAuthenticator.java:254) ... 31 more
Cause
If ABFS is configured on a cluster with a wrong value for property fs.azure.account.oauth2.client.id, or if you try to access an explicit path of the form abfss://myContainer@myStorageAccount.dfs.core.windows.net/... where myStorageAccount does not exist, then the ABFS driver ends up in a retry loop and becomes unresponsive. The command will eventually fail, but because it retries so many times, it appears to be a hung command.
If you try to access an incorrect path with an existing storage account, you will see a 404 error message. The system does not hang in this case.
Solution
You must verify the accuracy of all credentials when accessing ABFS data. You must also verify the ABFS path you are trying to access exists. If either of these are incorrect, the problem occurs.