Problem
You’re trying to use a notebook with a service principal for authentication to create an external table on an external schema in a Hive metastore, and receive an error.
"KeyProviderException: Failure to initialize configuration for storage account <storage-account>.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key".
Cause
You have an invalid configuration value for fs.azure.account.key
. Invalid configuration values occur when the Azure storage account doesn’t have proper access to the service principal.
Solution
Set up authorization at the cluster configuration level instead of the notebook level. Set up the Apache Spark configuration on the cluster as an init script. Create an init script with the following code and add it to the cluster configuration.
```
#!/bin/bash
cat <<EOF >> /databricks/driver/conf/00-custom-spark.conf
[driver] {
"spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net" = "OAuth"
"spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net" = "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
"spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net" = "<service-principal-app-id>"
"spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net" = "<service-principal-secret>"
"spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net" = "https://login.microsoftonline.com/<Azure-AD-tenant-directory-id>/oauth2/token"
}
EOF
```
If you store your service principal credentials in Databricks secrets, modify the init script to use clientId
and clientSec
as environment variables pointing to the secrets instead. Add the environment variables to the cluster configuration.
```
#!/bin/bash
clientId={{secrets/<secret-scope>/<service-credential-key>}}
clientSec={{secrets/<secret-scope>/<service-credential-key>}}
cat <<EOF >> /databricks/driver/conf/00-custom-spark.conf
[driver] {
"spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net" = "OAuth"
"spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net" = "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider"
"spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net" = ${clientId}
"spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net" = ${clientSec}
"spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net" = "https://login.microsoftonline.com/<Azure-AD-tenant-directory-id>/oauth2/token"
}
EOF
```