Automatic identity management (AIM) enablement prep script

Prepare for automatic identity management with the AIM enablement prep script.

Written by Dinesh Pawar

Last published at: May 11th, 2026

About the script

  • The AIM enablement prep script assists account admins in identifying and resolving External ID and group membership divergences between Databricks and your IdP.
  • The script also identifies which workspaces are identity-federated. Since AIM only functions within identity-federated workspaces, it is optional but recommended for customers to enable identity federation across all workspaces in the account.

 

What are divergences?

Automatic Identity Management relies on provisioned identities having an externalId that matches the corresponding principal’s Object ID (unique ID for principal identification) in the Identity Provider (IdP). The externalId tells Databricks which principal in your Identity Provider a given Databricks identity corresponds to. Missing or incorrectly populated externalId values can prevent identity metadata from syncing between your IdP and Databricks and cause duplicate identities to appear in certain parts of the product.

Additionally, it is possible for group membership in your IdP and in Databricks to have diverged since Databricks members are mutable. This divergence in group memberships can create complications when turning off SCIM.

AIM is also only available for identity federation-enabled workspaces. Workspaces without identity federation enabled will continue to work when AIM is enabled, but without the benefits provided by AIM.                                                         

To help you identify these issues, our team has built a Python script that runs as a notebook, which audits the identities in your Databricks account and flags potential divergences.

Note

Account admin privileges are required to run this script.

 

Why does this matter?

The script will help the admin discover identities provisioned in Databricks whose external IDs do not have a corresponding match in Entra ID. It will also help detect divergences between Databricks and EntraId group memberships. Some illustrative issues that the tool will help detect and resolve are:

Issue A: Duplicate identities appearing in the product

  • If two identities with the same name but different sources appear in the Databricks admin UIs, it often means that externalIds are misconfigured.
  • This will cause one account identity and one IdP identity to appear in admin UIs and sharing modals.
  • Which error categories should I fix to resolve this issue?
    • EXTERNAL_ID_NOT_IN_IDP
    • EXTERNAL_ID_MATCH_NAME_MISMATCH
    • NAME_MATCH_EXTERNAL_ID_MISMATCH

Issue B: Group members count in the IdP doesn’t match the count displayed in the Databricks UIs

  • The databricks UIs show the member count in the IdP.
  • This means members that exist in the Databricks group but don’t exist in the IdP memberships will not appear in the count (even though the membership works for permissions)
  • Which error categories should I fix to resolve this issue?
    • GROUP_HAS_LOCAL_MEMBERS_WITHOUT_EXTERNAL_ID
    • GROUP_HAS_LOCAL_MEMBERS_WITH_EXTERNAL_ID

Issue C: Provisioning an IdP group is failing

  • When attempting to import an IdP group, you may face an error that says the group already exists in Databricks.
  • This is likely caused by an existing account group that is reserving the name. This happens since Databricks enforces a unique groupname constraint.
  • Which error categories should I fix to resolve this issue?
    • NAME_MATCH_EXTERNAL_ID_MISMATCH

 

Tool overview

This tool scans all users, groups, and service principals provisioned in a Databricks account and logs inconsistencies with their data from the IdP. The logs are written as CSV reports in the divergence/results folder.

The program runs in three phases:

  • Phase 1 (Workspaces compatibility check):
    • Lists all workspaces.
    • Analyzes workspaces to find those that have incompatibility with AIM.
  • Phase 2 (Gather identities):
    • Fetches the IDs of all the provisioned identities in Databricks. If TARGET_IDENTITIES is configured, it only fetches the identities specified by TARGET_IDENTITIES.
    • Writes the IDs to intermediate CSVs.
    • If interrupted, this phase restarts from scratch on the next run.
  • Phase 3 (Identities divergence check):
    • Reads the gathered IDs.
    • Compares each identity with its IdP counterpart in concurrent batches.
    • Analyzes the responses of the endpoint and writes Databricks provisioned identities with divergences from the IdP to the output CSVs.
    • Progress is saved after every batch, so if the program crashes, it resumes from the last completed batch.
    • For Automatic Identity Management (AIM) enabled accounts, this performs a sync with the Identity Provider to preemptively fix issues.

Setup and running the tool

  1. Download the divergence ZIP file. In any workspace, click New > Notebook > File > Import. Select the ZIP file and click Import. Open the divergence folder.
  2. Provide the script credentials to authenticate API calls. To do so, create an account-level service principal, give it an account admin role, and generate an OAuth secret. This is all available from the account admin UI. Then run the following commands via the CLI:
    • databricks auth login --host <workspace_url>
    • databricks secrets create-scope divergence
    • databricks secrets put-secret divergence client_id --string-value <CLIENT_ID>
    • databricks secrets put-secret divergence client_secret --string-value <CLIENT_SECRET>
  3. Open python/config.py and fill in the following:
    • ACCOUNTS_HOST: Your Databricks accounts console URL without any additional URL parameters (for example, https://accounts.azuredatabricks.net).
    • ACCOUNT_ID: Your Databricks account ID.
    • SECRET_SCOPE: The name of the secret scope you created above.
  4. Optionally adjust:
    • INCLUDE_USERS, INCLUDE_GROUPS, INCLUDE_SERVICE_PRINCIPALS: Default True. Set to False to skip that identity type.
    • TARGET_IDENTITIES: Run only for specific identities instead of a full scan.
  5. Go to the run_divergence notebook and click Run all.


The script refreshes its access token automatically in the background every 30 minutes, so long runs do not require manual intervention. If it stops during phase 3, you can simply rerun all to resume progress. If for whatever reason you would like to restart the run from the beginning, delete the results folder.

Interpreting tool output

The script produces a couple of files and writes them to the divergence/results folder.

  • divergence_workspaces.csv
    • The workspaces that have incompatibilities with AIM.
  • identities_to_process_<principal_type>.csv
    • IDs of the identities that went through the divergence check.
    • One file per principal type.
  • idp_divergence_<principal_type>.csv
    • The identities that had divergences with their IdP counterpart; see exact output columns below.
    • Only identities with divergences are written to this file. Identities with no divergences are omitted.
    • One file per principal type.
  • idp_divergence_failures.csv
    • The identities that failed the divergence check (and all retries).
    • Principals of all types are aggregated in this file.
  • idp_divergence_progress.json
    • Temporary progress file for progress tracking and crash recovery.
    • Can typically be ignored.

 

Workspaces (divergence_workspaces.csv) output columns

  • workspaceId: The Databricks workspace ID.
  • errorCategories: Semicolon-separated error category names (see below).

 

Users (idp_divergence_users.csv) output columns

  • id: The Databricks internal ID.
  • username: The username of the provisioned Databricks user.
  • externalId: The external ID stored in Databricks on the provisioned Databricks user.
  • externalIdWithUsernameMatch: Semicolon-separated external IDs of IdP users that are matched by username.
  • errorCategories: Semicolon-separated error category names (see below).

 

Groups (idp_divergence_groups.csv) output columns

  • id: The Databricks internal ID.
  • groupName: The name of the provisioned Databricks group.
  • externalId: The external ID stored in Databricks on the provisioned Databricks group.
  • externalIdsWithGroupnameMatch: Semicolon-separated external IDs of IdP groups that are matched by group name.
  • localMembersNotInIdpInternalIds: Semicolon-separated internal IDs of group members that exist only in Databricks and have no external ID.
  • externalMembersNotInIdpInternalIds: Semicolon-separated internal IDs of group members that have an external ID but are not members in the IdP group.
  • errorCategories: Semicolon-separated error category names (see below).

 

Service principals (idp_divergence_service_principals.csv) output columns

  • id: The Databricks internal ID.
  • applicationId: The application ID of the provisioned Databricks service principal.
  • externalId: The external ID stored in Databricks on the provisioned Databricks service principal.
  • externalIdWithAppIdMatch: External ID of the IdP service principal that is matched by application ID.
  • errorCategories: Semicolon-separated error category names (see below).

 

Error category

Description

Action to take

Potential issues if unresolved

IDENTITY_FEDERATION_DISABLED

The workspace does not have identity federation enabled.

Enable identity federation for the workspace from the account console.

Account-level identities and IdP identities will not be available in identity federation disabled workspaces.

These workspaces will still work as before, but without the capabilities of AIM.

EXTERNAL_ID_NOT_IN_IDP

The provisioned identity has an external ID set, but it does not match any identity of the same type in the IdP.

The externalId on the identity is misconfigured and should be updated to a valid externalId or removed altogether. If you update it to a new externalId, make sure there are no other identities that use it.

 

To determine which externalId to update to, see the NAME_MATCH_EXTERNAL_ID_MISMATCH error category.

If the externalId is supposed to be linked to an IdP identity, you may see duplicate identities (one with an incorrect externalId and one from the IdP).

EXTERNAL_ID_MATCH_NAME_MISMATCH

The Databricks identity has an external ID that maps to an identity with a different unique name in the IdP.

For users and SPs, the username on Databricks needs to be updated. File a support ticket to do so.

 

For groups, check your account to see if any account groups are reserving the group name (Databricks enforces unique group names). If so, consider renaming the account group to a different group name so the external group can claim the name.

When users log in, it frequently results in the creation of a second user with the same externalId but different usernames.

 

For groups, it will often lead to the external group not being able to sync its name with its IdP counterpart.

NAME_MATCH_EXTERNAL_ID_MISMATCH

The Databricks identity has a unique name match with an IdP identity that does not match its externalId.

In most cases, the solution here is to update the Databricks externalId to match the IdP identity. It is important to double-check whether this is the correct solution, and it can vary based on your IdP and local data. 

 

See the externalIdWithUsernameMatchexternalIdsWithGroupnameMatch, and externalIdWithAppIdMatch fields, respectively, to find the externalId to update to.

If the externalId is supposed to be linked to an IdP identity, you may see duplicate identities (one with an incorrect or no externalId and one from the IdP). 

 

If trying to provision an IdP group with the same name, it may fail since there is an account group already using the name (Databricks enforces unique group names).

GROUP_HAS_LOCAL_MEMBERS_WITHOUT_EXTERNAL_ID

The Databricks group has members without an externalId. These members do not have a corresponding membership in the IdP.

In truly let the IdP be the source of truth, it is recommended to remove any locally added members from the group via SCIM. If the member should be a part of the group, it is recommended to create the member in the IdP and add it to the IdP group.

 

See the localMembersNotInIdpInternalIds field in the output for the members affected.

Members will inherit permissions from the IdP group. That said, these members will not appear in the IdP, which can make auditing permissions difficult.

Member counts in the UI only reflect IdP member counts, which won’t reflect these members.

GROUP_HAS_LOCAL_MEMBERS_WITH_EXTERNAL_ID

The Databricks group has members with an externalId. These members do not have a corresponding membership in the IdP.

To truly let the IdP be the source of truth, it is recommended to remove any locally added members from the group via SCIM. If the member should be a part of the group, it is recommended to add it to the IdP group.

 

See the externalMembersNotInIdpInternalIds field in the output for the members affected.

Members will inherit permissions from the IdP group. That said, these members will not appear in the IdP, which can make auditing permissions difficult.

Member counts in the UI only reflect IdP member counts, which won’t reflect these members.

 

Update externalId for a principal

To update the externalId use Account SCIM to perform the operation. It is recommended to log any API calls to ease rollback if any issues come up during the process.

PATCH https://<accountUrl>/api/2.1/accounts/<accountId>/scim/v2/<Users|Groups|ServicePrincipals>/<databricksId>

{
  "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
  "Operations": [
    {
      "op": "replace",
      "path": "externalId",
      "value": "<newExternalId>"
    }
  ]
}

Remove externalId for a group

To remove the externalId for a group you can use Account SCIM to perform the operation. It is recommended to log any API calls to ease rollback if any issues come up during the process. 

Note

At the moment Databricks only supports this operation for groups:

PATCH https://<accountUrl>/api/2.1/accounts/<accountId>/scim/v2/Groups/<databricksId>

 
{
  "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"],
  "Operations": [
    {
      "op": "replace",
      "path": "externalId",
      "value": ""
    }
  ]
}