You can use the Databricks Workspace API (AWS | Azure | GCP) to recursively list all workspace objects under a given path.
Common use cases for this include:
- Indexing all notebook names and types for all users in your workspace.
- Use the output, in conjunction with other API calls, to delete unused workspaces or to manage notebooks.
- Dynamically get the absolute path of a notebook under a given user, and submit that to the Databricks Jobs API to trigger notebook-based jobs (AWS | Azure | GCP).
Define function
This example code defines the function and the logic needed to run it.
You should place this code at the beginning of your notebook.
You need to replace <token> with your personal access token (AWS | Azure | GCP).
%python import requests import json from ast import literal_eval # Authorization headers = { 'Authorization': 'Bearer <token>', } # Define rec_req as a function. # Note: Default path is "/" which scans all users and folders. def rec_req(instanceName,loc="/"): data_path = '{{"path": "{0}"}}'.format(loc) instance = instanceName url = '{}/api/2.0/workspace/list'.format(instance) response = requests.get(url, headers=headers, data=data_path) # Raise exception if a directory or URL does not exist. response.raise_for_status() jsonResponse = response.json() for i,result in jsonResponse.items(): for value in result: dump = json.dumps(value) data = literal_eval(dump) if data['object_type'] == 'DIRECTORY': # Iterate through all folders. rec_req(instanceName,data['path']) elif data['object_type'] == 'NOTEBOOK': # Return the notebook path. print(data) else: # Skip imported libraries. pass
Run function
Once you have defined the function in your notebook, you can call it at any time.
You need to replace <instance-name> with the instance name (AWS | Azure | GCP) of your Databricks deployment. This is typically the URL, without any workspace ID.
You need to replace <path> with the full path you want to search. This is typically /.
%python rec_req("https://<instance-name>", "<path>")