403 error when attempting to synchronize users through an Azure Databricks SCIM provisioning connector

Automate updating the Databricks IP access list using the Databricks Accounts API.

Written by ankit.raj

Last published at: March 3rd, 2025

Problem

You want to ensure your Databricks account portal is inaccessible from the internet while still allowing a connection from Microsoft Entra ID to Databricks. You follow the steps to allowlist Microsoft Entra ID IPs as recommended in the Tutorial: Develop and plan provisioning for a SCIM endpoint in Microsoft Entra ID documentation. The current requirement is to add all the Microsoft Entra ID IPs to the Databricks IP access list, so you can establish a connection from the Microsoft Entra ID service for SCIM. You have IP access control lists (ACLs) enabled at the account level.

During this setup, you encounter a 403 error when you attempt to synchronize users through an Azure Databricks SCIM provisioning connector. 

 

Cause

The IP addresses Microsoft Entra ID service uses are dynamic. In order for Microsoft Entra ID SCIM provisioning to work with an IP ACL in place, the IP addresses with the AzureActiveDirectory tag need to be allowlisted. Their dynamic nature means it is difficult to maintain an updated IP ACL manually.

 

Solution

Implement an automation script to manage dynamic IP addresses. The script should do the following.

  1. Retrieve the current list of public IP address prefixes for EntraID using the Python requests API.
  2. Update the Databricks IP access list using the accounts IP access lists API (using PATCH).

Best practices while implementing your script include the following.

  • Regularly download the updated IP ranges file from Microsoft and update the IP access list accordingly. This file is updated weekly, and new ranges will not be used in Azure for at least one week.
  • Test the automation script rigorously in a lower environment before implementing it in production.
  • Take a backup of the current IP access list before the first implementation.

 

Note

Updating an IP access list requires the admin role.

 

 

How to run the script

First, preview the changes that the script makes.

python <your-script-name>.py 

or  

python <your-script-name>.py --dry-run=true

The script outputs a list of IP ranges that will be added to the Databricks IP access list, and a list of IP ranges to be removed from the Databricks IP access list.

Then, apply the changes to the Databricks IP access list. 

python <your-script-name>.py --dry-run=false

 

Example script 

You can use the following example script for <your-script-name>.py. It sets constants, gets the latest service tags and address prefixes for Azure Active Directory. Then it gets the IP access list by ID name and updates the IP access list. Last, it determines changes and updates the IP access list with new prefixes. 

import requests
import json
import re
import argparse

# Constants
MICROSOFT_DOWNLOAD_PAGE = "https://www.microsoft.com/en-us/download/details.aspx?id=56519"
account_id = '<your-Databricks-account-id>'
DATABRICKS_INSTANCE = 'https://accounts.azuredatabricks.net/'
DATABRICKS_TOKEN = '<set-authentication-mechanism-following-your-account-config>'
IP_ACCESS_LIST_NAME = "AzureActiveDirectory"

# Function to get the latest service tags JSON URL
def get_latest_service_tags_url():
    headers = {"User-Agent": "curl/7.81.0", "Accept": "*/*"}
    response = requests.get(MICROSOFT_DOWNLOAD_PAGE, headers=headers)
    match = re.search(r'ServiceTags_Public_\d+\.json', response.text)
    if not match:
        raise ValueError("Could not find the latest ServiceTags_Public JSON file.")
    filename = match.group(0)
    return f"https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/{filename}"

# Function to get the address prefixes for Azure Active Directory
def get_azure_ad_prefixes(url):
    response = requests.get(url)
    data = response.json()
    azure_ad_prefixes = []
    for item in data['values']:
        if item['name'] == "AzureActiveDirectory" and item['id'] == "AzureActiveDirectory":
            azure_ad_prefixes = item['properties']['addressPrefixes']
            azure_ad_prefixes = [prefix for prefix in azure_ad_prefixes if ":" not in prefix]
            break
    return azure_ad_prefixes

# Function to get the IP access list ID by name
def get_ip_access_list_id(access_list_name):
    url = f"{DATABRICKS_INSTANCE}/api/2.0/accounts/{account_id}/ip-access-lists"
    headers = {"Authorization": f"Bearer {DATABRICKS_TOKEN}"}
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    access_lists = response.json()['ip_access_lists']
    for acl in access_lists:
        if acl['label'] == access_list_name:
            return acl['list_id'], acl['ip_addresses']
    raise ValueError(f"Access list with label {access_list_name} not found.")

# Function to update the IP access list
def update_ip_access_list(access_list_id, ip_prefixes):
    url = f"{DATABRICKS_INSTANCE}/api/2.0/accounts/{account_id}/ip-access-lists/{access_list_id}"
    headers = {
        "Authorization": f"Bearer {DATABRICKS_TOKEN}",
        "Content-Type": "application/json"
    }
    payload = {
        "label": IP_ACCESS_LIST_NAME,
        "list_type": "ALLOW",
        "ip_addresses": ip_prefixes,
        "enabled": True
    }
    response = requests.patch(url, headers=headers, data=json.dumps(payload))
    response.raise_for_status()
    return response.json()

# Main script logic
def main(dry_run=True):
    try:
        # Get latest service tags URL
        latest_url = get_latest_service_tags_url()
        print(f"Using Service Tags URL: {latest_url}")

        # Get Azure AD prefixes
        azure_ad_prefixes = get_azure_ad_prefixes(latest_url)
        
        # Get the IP access list ID and current IPs for the given name
        ip_access_list_id, current_ip_prefixes = get_ip_access_list_id(IP_ACCESS_LIST_NAME)
        
        # Determine changes
        new_prefixes = set(azure_ad_prefixes) - set(current_ip_prefixes)
        removed_prefixes = set(current_ip_prefixes) - set(azure_ad_prefixes)
        
        if dry_run:
            print("Dry Run: Changes that will be made")
            print("New prefixes to add:", new_prefixes)
            print("Prefixes to remove:", removed_prefixes)
            return

        # Update the IP access list with the new prefixes
        updated_ips = list(set(azure_ad_prefixes))
        response = update_ip_access_list(ip_access_list_id, updated_ips)
        print("IP access list updated successfully:", response)

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    
    parser = argparse.ArgumentParser(description="Update Databricks IP Access List with Azure AD IPs")
    parser.add_argument("--dry-run", type=bool, default=True, help="Perform a dry run to show changes without applying them (default: True)")
    args = parser.parse_args()
    
    main(dry_run=args.dry_run)

 

For more information, refer to the Microsoft Entra on-premises application provisioning to SCIM-enabled apps documentation. 

For quick access to the IP address list file with the AzureActiveDirectory tag, refer to the Azure IP Ranges and Service Tags – Public Cloud webpage.