List all workspace objects

List all Databricks workspace objects under a given path.

Written by Adam Pavlacka

Last published at: May 19th, 2022

You can use the Databricks Workspace API (AWS | Azure | GCP) to recursively list all workspace objects under a given path.

Common use cases for this include:

  • Indexing all notebook names and types for all users in your workspace.
  • Use the output, in conjunction with other API calls, to delete unused workspaces or to manage notebooks.
  • Dynamically get the absolute path of a notebook under a given user, and submit that to the Databricks Jobs API to trigger notebook-based jobs (AWS | Azure | GCP).

Define function

This example code defines the function and the logic needed to run it.

You should place this code at the beginning of your notebook.

You need to replace <token> with your personal access token (AWS | Azure | GCP).

%python

import requests
import json
from ast import literal_eval

# Authorization
headers = {
  'Authorization': 'Bearer <token>',
}

# Define rec_req as a function.
# Note: Default path is "/" which scans all users and folders.

def rec_req(instanceName,loc="/"):
 data_path = '{{"path": "{0}"}}'.format(loc)
 instance = instanceName
 url = '{}/api/2.0/workspace/list'.format(instance)
 response = requests.get(url, headers=headers, data=data_path)
 # Raise exception if a directory or URL does not exist.
 response.raise_for_status()
 jsonResponse = response.json()
 for i,result in jsonResponse.items():
   for value in result:
    dump = json.dumps(value)
    data = literal_eval(dump)
    if data['object_type'] == 'DIRECTORY':
     # Iterate through all folders.
     rec_req(instanceName,data['path'])
    elif data['object_type'] == 'NOTEBOOK':
     # Return the notebook path.
     print(data)
    else:
     # Skip imported libraries.
     pass

Run function

Once you have defined the function in your notebook, you can call it at any time.

You need to replace <instance-name> with the instance name (AWS | Azure | GCP) of your Databricks deployment. This is typically the URL, without any workspace ID.

You need to replace <path> with the full path you want to search. This is typically /.

%python


rec_req("https://<instance-name>", "<path>")
Delete

Info

You should NOT include a trailing / as the last character of the instance name. The function generates an error if a trailing / is included.


Was this article helpful?