In the Databricks Jobs API 2.0 (AWS | Azure | GCP) list returns an unbounded number of job descriptions.
In the Jobs API 2.1 (AWS | Azure | GCP), this behavior has changed. The list command now returns a maximum of 25 jobs, from newest to oldest, at a time.
In this article we show you how to manually iterate through all of the jobs in your workspace.
Instructions
1) Determine the total number of jobs in your workspace
- Click Workflows in the sidebar.
- Scroll to the bottom of the page.
- The total number of jobs in the workspace is listed in the bottom right.
2) Determine the values to use for offset and limit
The list command has two modifiers, limit and offset. offset determines the number of jobs that are skipped before the first one is displayed. limit determines the number of jobs (up to 25) that are displayed. By using the commands together you can display specific jobs out of the total.
For example, if there are 20 total jobs in the workspace and you specify a limit of 10 and an offset of 0, list returns jobs 1-10 (the 10 most recent jobs created, not the most recent job runs). Alternatively, if you specify a limit of 10 and an offset of 10, list returns jobs 11-20.
You should consider the total number of jobs in your workspace and choose values for limit and offset that allow you to easily iterate through the total number of jobs.
3) Iterate through the jobs
You need to iterate through the total number of jobs. For this article, we are iterating through all of the jobs in a notebook, using curl to access the API. We are assuming the list of jobs is large and are displaying the maximum of 25 at a time.
Review the Authentication using Databricks personal access tokens (AWS | Azure | GCP) documentation for more information on creating and using personal-access-tokens.
As first call:
The first run uses limit=25.
And as subsequent calls, you can run the below syntax:
curl --location --header 'Authorization: Bearer <personal-access-token>' --request GET / 'https://<databricks-instance>/api/2.1/jobs/list?limit=25&page_token=<page_token>'
You can continue to iterate through the total number of jobs, displaying 25 at a time, until all of the jobs have been displayed.
4) Use jq to filter results
You can use jq to help filter for specific results. For example, if you pipe the results of your list request through jq '.deb', it returns objects with a value for the key deb.
%sh curl --location --header 'Authorization: Bearer <personal-access-token>' --request GET 'https://<databricks-instance>/api/2.1/jobs/list?limit=25&offset=0' | jq '.deb'
You can include multiple keys when using jq. For example, jq '.deb, .last_updated' returns jobs with values for both of the keys.
%sh curl --location --header 'Authorization: Bearer <personal-access-token>' --request GET 'https://<databricks-instance>/api/2.1/jobs/list?limit=25&offset=0' | jq '.deb, .last_updated’