ABFS client hangs if incorrect client ID or wrong path used

Trying to access an Azure Blob File System (ABFS) path results in a hung command when using Azure Data Lake Storage Gen2 (ADLS).

Written by Adam Pavlacka

Last published at: June 1st, 2022

Problem

You are using Azure Data Lake Storage (ADLS) Gen2. When you try to access an Azure Blob File System (ABFS) path from a Databricks cluster, the command hangs.

Enable the debug log and you can see the following stack trace in the driver logs:

Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/b9b831a9-6c10-40bf-86f3-489ed83c81e8/oauth2/token
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
  at sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:91)
  at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1484)
  at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1482)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:782)
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1481)
  at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
  at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
  at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenSingleCall(AzureADAuthenticator.java:254)
  ... 31 more

Cause

If ABFS is configured on a cluster with a wrong value for property fs.azure.account.oauth2.client.id, or if you try to access an explicit path of the form abfss://myContainer@myStorageAccount.dfs.core.windows.net/... where myStorageAccount does not exist, then the ABFS driver ends up in a retry loop and becomes unresponsive. The command will eventually fail, but because it retries so many times, it appears to be a hung command.

If you try to access an incorrect path with an existing storage account, you will see a 404 error message. The system does not hang in this case.

Solution

You must verify the accuracy of all credentials when accessing ABFS data. You must also verify the ABFS path you are trying to access exists. If either of these are incorrect, the problem occurs.