Iterate through all jobs in the workspace using Jobs API 2.1

Use the Jobs API 2.1 to iterate through and display a list of jobs in your workspace.

Written by debayan.mukherjee

Last published at: July 28th, 2023

In the Databricks Jobs API 2.0 (AWS | Azure | GCP) list returns an unbounded number of job descriptions.

In the Jobs API 2.1 (AWS | Azure | GCP), this behavior has changed. The list command now returns a maximum of 25 jobs, from newest to oldest, at a time.

In this article we show you how to manually iterate through all of the jobs in your workspace.

Instructions

1) Determine the total number of jobs in your workspace

  1. Click Workflows in the sidebar.
  2. Scroll to the bottom of the page.
  3. The total number of jobs in the workspace is listed in the bottom right.

2) Determine the values to use for offset and limit

The list command has two modifiers, limit and offset. offset determines the number of jobs that are skipped before the first one is displayed. limit determines the number of jobs (up to 25) that are displayed. By using the commands together you can display specific jobs out of the total.

For example, if there are 20 total jobs in the workspace and you specify a limit of 10 and an offset of 0, list returns jobs 1-10 (the 10 most recent jobs created, not the most recent job runs). Alternatively, if you specify a limit of 10 and an offset of 10, list returns jobs 11-20.

You should consider the total number of jobs in your workspace and choose values for limit and offset that allow you to easily iterate through the total number of jobs.

3) Iterate through the jobs

You need to iterate through the total number of jobs. For this article, we are iterating through all of the jobs in a notebook, using curl to access the API. We are assuming the list of jobs is large and are displaying the maximum of 25 at a time.

Review the Authentication using Databricks personal access tokens (AWS | AzureGCP) documentation for more information on creating and using personal-access-tokens.

As first call:

%sh
curl --location --header 'Authorization: Bearer <personal-access-token>'  --request GET / 'https://<databricks-instance>/api/2.1/jobs/list?limit=25'


The first run uses limit=25.

And as subsequent calls, you can run the below syntax:

curl --location --header 'Authorization: Bearer <personal-access-token>'  --request GET / 'https://<databricks-instance>/api/2.1/jobs/list?limit=25&page_token=<page_token>'

You can continue to iterate through the total number of jobs, displaying 25 at a time, until all of the jobs have been displayed.

4) Use jq to filter results

Delete

Info

jq can be described as "sed for JSON data". You can use it to slice, filter, map, and transform structured data.

You can use jq to help filter for specific results. For example, if you pipe the results of your list request through jq '.deb', it returns objects with a value for the key deb.

%sh
curl --location --header 'Authorization: Bearer <personal-access-token>'  --request GET 'https://<databricks-instance>/api/2.1/jobs/list?limit=25&offset=0' | jq '.deb'


You can include multiple keys when using jq. For example, jq '.deb, .last_updated' returns jobs with values for both of the keys.

%sh
curl --location --header 'Authorization: Bearer <personal-access-token>'  --request GET 'https://<databricks-instance>/api/2.1/jobs/list?limit=25&offset=0' | jq '.deb, .last_updated’