KeyProviderException error when trying to create an external table on an external schema with authentication at the notebook level

Set up authorization at the cluster configuration level instead.

Written by Ernesto Calderón

Last published at: January 31st, 2025

Problem

You’re trying to use a notebook with a service principal for authentication to create an external table on an external schema in a Hive metastore, and receive an error.

 

"KeyProviderException: Failure to initialize configuration for storage account <storage-account>.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key".

 

Cause

You have an invalid configuration value for fs.azure.account.key. Invalid configuration values occur when the Azure storage account doesn’t have proper access to the service principal. 

 

Solution

Set up authorization at the cluster configuration level instead of the notebook level. Set up the Apache Spark configuration on the cluster as an init script. Create an init script with the following code and add it to the cluster configuration.

 

```
#!/bin/bash
cat <<EOF >> /databricks/driver/conf/00-custom-spark.conf
[driver] {
  "spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net" = "OAuth"
  "spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net" = "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
  "spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net" = "<service-principal-app-id>"
  "spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net" = "<service-principal-secret>"
  "spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net" = "https://login.microsoftonline.com/<Azure-AD-tenant-directory-id>/oauth2/token"
}
EOF
```

 

If you store your service principal credentials in Databricks secrets, modify the init script to use clientId and clientSec as environment variables pointing to the secrets instead. Add the environment variables to the cluster configuration. 

 

```
#!/bin/bash
clientId={{secrets/<secret-scope>/<service-credential-key>}}
clientSec={{secrets/<secret-scope>/<service-credential-key>}}
cat <<EOF >> /databricks/driver/conf/00-custom-spark.conf
[driver] {
  "spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net" = "OAuth"
  "spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net" = "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
  "spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net" = ${clientId}
  "spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net" = ${clientSec}
  "spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net" = "https://login.microsoftonline.com/<Azure-AD-tenant-directory-id>/oauth2/token"
}
EOF
```