Problem
Access to Azure Data Lake Storage Gen1 (ADLS Gen1) fails with ADLException: Error getting info for file <filename> when the following network configuration is in place:
- Azure Databricks workspace is deployed in your own virtual network (uses VNet injection).
- Traffic is allowed via Azure Data Lake Storage credential passthrough.
- ADLS Gen1 storage firewall is enabled.
- Azure Active Directory (Azure AD) service endpoint is enabled for the Azure Databricks workspace’s virtual network.
Cause
Azure Databricks uses a control plane located in its own virtual network, and the control plane is responsible for obtaining a token from Azure AD. ADLS credential passthrough uses the control plane to obtain Azure AD tokens to authenticate the interactive user with ADLS Gen1.
When you deploy your Databricks workspace in your own virtual network (using VNet injection), Azure Databricks clusters are created in your own virtual network. For increased security, you can restrict access to the ADLS Gen 1 account by configuring the ADLS Gen1 firewall to allow only requests from your own virtual network, by implementing service endpoints to Azure AD.
However, ADLS credential passthrough fails in this case. The reason is that when ADLS Gen1 checks for the virtual network where the token was created, it finds the network to be the Azure Databricks control plane and not the customer-provided virtual network where the original passthrough call was made.
Solution
To use ADLS credential passthrough with a service endpoint, storage firewall, and ADLS Gen1, enable Allow access to Azure services in the firewall settings.
If you have security concerns about enabling this setting in the firewall, you can upgrade to ADLS Gen2. ADLS Gen2 works with the network configuration described above.
- Deploy Azure Databricks in your Azure virtual network (VNet injection)
- Access Azure Data Lake Storage using Azure Active Directory credential passthrough