Introduction
You may require your Databricks jobs to access notebooks stored in external Git providers, such as GitHub or Azure DevOps, using a non-interactive service principal. You need to ensure the service principal can seamlessly authenticate and retrieve the necessary Git credentials from the API to avoid job failure due to missing credentials.
Instructions
When you manage notebooks in an external Git repository, you must give the Databricks environment access to those Git resources. By default, Databricks jobs use a user-based token or interactive authentication to clone or fetch code from Git. For jobs run programmatically using service principals, you need to provide the service principal with valid credentials to authenticate against the Git provider.
First, generate a Personal Access Token (PAT) in your Git provider with appropriate scopes to access the repository.
Next, create an on-behalf-of (OBO) token for your service principal. Use endpoint POST /api/2.0/token-management/on-behalf-of/tokens
and make a request.
You can modify and use the following code in a notebook using an interactive compute cluster or from your local terminal.
%sh
# Set up the environment variables
export DATABRICKS_WORKSPACE_URL="<workspace-url>"
export OBO_TOKEN="<OBO-token>"
curl -X POST https://<DATABRICKS_WORKSPACE_URL>/api/2.0/token-management/on-behalf-of/tokens \
-H "Authorization: Bearer <OBO-token-or-AAD-token>" \
-H "Content-Type: application/json" \
-d '{
"application_id": "<service-principal-client-ID>",
"comment": "...",
"lifetime_seconds": 360000
}'
For details, refer to the Create on-behalf token (AWS | GCP) API documentation.
Note
If you use Azure, Databricks recommends using Microsoft Entra ID (formerly Azure Active Directory or AAD) authentication for service principals. Alternatively, you can create a PAT token for a service principal. For details on how to accomplish both, refer to the “Manage tokens for a service principal” of the Manage service principals documentation.
The request returns an API token in response. The following is an example response.
{
"token_value": "<OBO-token>",
"token_info": {
"token_id": "<token-id>",
"creation_time": <timestamp-in-milliseconds>,
"expiry_time": <timestamp-in-milliseconds>,
"comment": "...",
"created_by_id": <id-number>,
"created_by_username": "<user-email>",
"owner_id": <owner-id-number>
}
}
Then, use the returned token in "token_value"
to create Git credentials through an API call. You can modify and run the following commands in a notebook using an interactive compute cluster.
%sh
# Set up the environment variables
export DATABRICKS_WORKSPACE_URL="<workspace-url>"
export OBO_TOKEN="<OBO-token>"
# Perform the PATCH request
curl --location --request POST "${DATABRICKS_WORKSPACE_URL}/api/2.0/git-credentials" \
--header "Authorization: Bearer ${OBO_TOKEN}" \
--header "Content-Type: application/json" \
--data-raw '{
"git_provider": "<GIT-PROVIDER>",
"personal_access_token": "<GIT-TOKEN>",
"git_username": "<GIT-USERNAME>"
}'
Note
You can also save the value of "token_value"
in a Databricks secret for safety and reference it in the code. For more information, refer to the Secret management (AWS | Azure | GCP) documentation.
Last, when setting up the job in Databricks, specify your service principal as the job’s run-as user. The job should now have access to the Git credentials. For more information, refer to the Create a credential entry (AWS | Azure | GCP) API documentation.