Introduction
You want to check for untitled or stale, unused jobs with no recent changes in order to efficiently remove them, since these jobs still count toward a workspace’s job limit.
Instructions
Check for untitled jobs
First, install the Databricks SDK.
%pip install databricks-sdk
Then, run the following code in a notebook to look for jobs with “Untitled” in the name and return a count. The code uses the current user, so be sure you have access to all jobs, or ask an admin user with that access for help.
from databricks.sdk import WorkspaceClient
import re
"""
Finds jobs which start with the name: Untitled
Returns the count
"""
count = 0
w = WorkspaceClient()
for job in w.jobs.list():
if (re.match('Untitled.*', job.settings.name)):
count += 1
count
Check for stale, unused jobs
Stale jobs are jobs older than 30 days with no runs in the last 60 days. Run the following code in a notebook to find jobs within these parameters and return a count for deletion.
from databricks.sdk import WorkspaceClient
import re
from datetime import date
"""
Finds jobs which are older than 30 days and have no recent (last 60 days) runs.
Returns a count of jobs which can be deleted
"""
days_ago = 30
count = 0
w = WorkspaceClient()
for job in w.jobs.list():
delta = date.today() - date.fromtimestamp(job.created_time / 1000)
if (delta.days > days_ago):
if not (any(True for _ in w.jobs.list_runs(job_id=job.job_id))):
count += 1
count
Extra: check the number of existing jobs in the workspace
In case you may be close to the workspace’s job limit, you can proactively check the number of existing jobs. Run the following code in a notebook to return the total number of jobs.
from databricks.sdk import WorkspaceClient
import re
"""
Returns the total number of jobs in the workspace
"""
count = 0
w = WorkspaceClient()
len(list(w.jobs.list()))
Take action to remove these following your organization’s policies.