How to programmatically identify account admins using the Databricks REST API

Use a Databricks notebook to get all the users with account admin roles assigned when total counts are above 10000.

Written by guanlin.zhang

Last published at: June 27th, 2025

How-to Introduction

You want to know which users are assigned the account admin role in your Databricks environment. 

 

In smaller-scale operations, the easiest way to check is through your account admin portal. You can navigate to User Management and check the Only account admins checkbox.

 

When your user base grows significantly, Databricks offers a REST API call to list users programmatically instead. For details and use, refer to the List users (AWSAzure) API documentation. 

 

By design, this API call has a soft limit of 10,000 results per page. To retrieve the full user list when there are over 10,000 results, use the API’s startIndex and count parameters to paginate through all users and apply a filter for users with the account admin role.

 

Instructions

To begin, Databricks recommends using OAuth Machine-to-Machine (M2M) authentication when using the REST API. For more information, review the Authorize unattended access to Databricks resources with a service principal using OAuth (AWSAzure) documentation. This documentation explains how to use the client_id and client_secret to generate an access token.

 

Next, in the following example code, store client_id and client_secret as secrets. Also set your workspace URL as a value for the variable workspace_url. (If you work in a notebook, the code retrieves the URL dynamically.)

 

Replace the following variables before running the example code.

  • <oidc-scope> is the scope for where you store the client id and secret.
  • <oidc-client-id> is the key for storing your client id secret value.
  • <oidc-client-secret> is the key for storing your client secret value.

 

import requests

client_id = dbutils.secrets.get(scope="<oidc-scope>", key="<oidc-client-id>")
client_secret = dbutils.secrets.get(scope="<oidc-scope>", key="<oidc-client-secret>")

workspace_url = spark.conf.get("spark.databricks.<your-workspace-url>")
token_endpoint_url = f"https://{workspace_url}/oidc/v1/token"

if not client_id or not client_secret:
    raise ValueError("CLIENT_ID or CLIENT_SECRET environment variables are not set.")

try:
    response = requests.post(
        token_endpoint_url,
        auth=(client_id, client_secret),
        data={"grant_type": "client_credentials", "scope": "all-apis"},
    )
    response.raise_for_status()
    token_data = response.json()

except requests.exceptions.RequestException as e:
    print(f"Error: {e}")
except ValueError as e:
    print(f"Error: {e}")
except KeyError as e:
    print(f"Error: Key {e} not found in response json")

 

Once you have authenticated and defined your workspace_url variable, get the total user count, set a batch count for pagination, and make the API call to fetch the user results with pagination. Store the results in a list. 

import requests


# Step 1: Get the total user count
api_url = f"http://{workspace_url}/api/2.0/account/scim/v2/Users?attributes=id"
total_count_response = requests.get(
    api_url,
    headers={"Authorization": "Bearer " + token_data["access_token"]},
    timeout=900,
)

# Print the time it took to retrieve the data
print(f"Took: {total_count_response.elapsed.total_seconds()} seconds")

# Get the total number of users
total_user_count = total_count_response.json()["totalResults"]

# Set the batch count for pagination, default to 2000
batch_cnt = 2000

# Adjust batch size if the total user count is smaller
if total_user_count <= batch_cnt:
    batch_cnt = total_user_count

# Step 2: Initialize an empty list for user details
user_tuple_list = list()

# Step 3: Loop through users using pagination logic
for i in range(0, total_user_count, batch_cnt):
    start_index = i + 1
    count = batch_cnt
    print("Processing users from index:", start_index, "count:", count)

    # API call to fetch users with pagination
    url = f"https://{workspace_url}/api/2.0/account/scim/v2/Users?attributes=userName,id,active,roles,email&startIndex={start_index}&count={count}"
    user_response = requests.get(
        url,
        headers={"Authorization": "Bearer " + token_data["access_token"]},
        timeout=900,
    )
    # Parse response and append user details to list
    for resource in user_response.json().get("Resources", []):
        roles = [role["value"] for role in resource.get("roles", [])]
        user_tuple_list.append((resource["id"], resource["userName"], roles))

 

Use Apache Spark SQL to define a schema for your user data and convert the list you created previously into a DataFrame. Then you can display a table of email addresses filtered on the account admin role.

from pyspark.sql.types import StructField, StringType, StructType, ArrayType

# Define schema for user data
user_schema = StructType([
    StructField("id", StringType(), True),
    StructField("userName", StringType(), True),
    StructField("roles", ArrayType(StringType()), True)
])

# Convert the user list into a DataFrame
user_df = spark.createDataFrame(user_tuple_list, user_schema)
user_df.createOrReplaceTempView("account_user_id_mapping")

display(spark.sql("SELECT userName as email, roles FROM account_user_id_mapping where array_contains(roles,'account_admin')"))

 

The following image shows the output you see in your notebook after running the example code, a list of emails with the role of account_admin.