You can use the Databricks Workspace API (AWS | Azure | GCP) to recursively list all workspace objects under a given path.
Common use cases for this include:
- Indexing all notebook names and types for all users in your workspace.
- Use the output, in conjunction with other API calls, to delete unused workspaces or to manage notebooks.
- Dynamically get the absolute path of a notebook under a given user, and submit that to the Databricks Jobs API to trigger notebook-based jobs (AWS | Azure | GCP).
Define function
This example code defines the function and the logic needed to run it.
You should place this code at the beginning of your notebook.
You need to replace <token> with your personal access token (AWS | Azure | GCP).
%python
import requests
import json
from ast import literal_eval
# Authorization
headers = {
'Authorization': 'Bearer <token>',
}
# Define rec_req as a function.
# Note: Default path is "/" which scans all users and folders.
def rec_req(instanceName,loc="/"):
data_path = '{{"path": "{0}"}}'.format(loc)
instance = instanceName
url = '{}/api/2.0/workspace/list'.format(instance)
response = requests.get(url, headers=headers, data=data_path)
# Raise exception if a directory or URL does not exist.
response.raise_for_status()
jsonResponse = response.json()
for i,result in jsonResponse.items():
for value in result:
dump = json.dumps(value)
data = literal_eval(dump)
if data['object_type'] == 'DIRECTORY':
# Iterate through all folders.
rec_req(instanceName,data['path'])
elif data['object_type'] == 'NOTEBOOK':
# Return the notebook path.
print(data)
else:
# Skip imported libraries.
passRun function
Once you have defined the function in your notebook, you can call it at any time.
You need to replace <instance-name> with the instance name (AWS | Azure | GCP) of your Databricks deployment. This is typically the URL, without any workspace ID.
You need to replace <path> with the full path you want to search. This is typically /.
%python
rec_req("https://<instance-name>", "<path>")