Use audit logs to identify who deleted a cluster

You can use audit logs to identify who deleted a cluster configuration.

Written by John.Lourdu

Last published at: October 31st, 2022

By default, all-purpose cluster configurations are deleted 30 days after the cluster was last terminated. It is possible to keep a cluster configuration for longer than 30 days if an administrator pins the cluster.

In either situation, it is possible for an administrator to manually delete a cluster configuration at any time.

If you try to run a job on a cluster that has had its configuration deleted, the run fails with a cluster does not exist error message.  

Run executed on existing cluster ID <cluster_id> failed since the cluster does not exist.


Databricks audit logs can be used to record the activities in your workspace, allowing you to monitor detailed Databricks usage patterns. 

Audit logging is NOT enabled by default and requires a few API calls to initialize the feature. 

Please review the Configure audit logging documentation for instructions on how to setup audit logging in your Databricks workspace.

If a cluster configuration is deleted unexpectedly, you can use the audit logs to identify who deleted the cluster configuration and when it was deleted.

Instructions

Once audit logging is enabled on your workspace, you can use it to find information on who deleted a specific cluster configuration.

Load audit logs

Before you can search through the audit logs, you must load them as a DataFrame and register the DataFrame as a temp table.

You will need to provide the S3 bucket name, the full path to the audit logs, and a name for the table.

Please review the Working with data in Amazon S3 documentation for more information.

%scala

val df = spark.read.format("json").load("s3a://<s3-bucket-name>/<path-to-audit-logs>")
df.createOrReplaceTempView("<audit-logs>")

Query audit log table

Once you have the audit logs in a table, you can use SQL to query them.

This article contains two example queries, showing how to find information on a specific cluster, as well as how to view all clusters that were deleted within a specific date range.

You can use these examples to build your own custom queries.

Display information on a specific cluster

This example query returns details on the cluster deletion event such as who deleted, when the cluster it was deleted.

You need to provide the name of the audit log table and the cluster ID of the deleted cluster.

%sql

select
  workspaceId,
  userIdentity.email,
  sourceIPAddress,
  to_timestamp(timestamp / 1000) as evenTimeStamp,
  ServiceName,
  actionName,
  requestParams.cluster_id as clusterId
from
  <audit-logs>
where
  serviceName = "clusters"
  AND actionName = "permanentDelete"
  AND requestParams.cluster_id = "<cluster-id>"

Display clusters deleted within a specific range

This example query returns a list of all clusters that were deleted during a specific date range.

You need to provide the name of the audit log table as well as the start date and the end date of the search period.

%sql

select
  workspaceId,
  userIdentity.email,
  sourceIPAddress,
  to_timestamp(timestamp / 1000) as evenTimeStamp,
  ServiceName,
  actionName,
  requestParams.cluster_id as clusterId
from
  <audit-logs>
where
  serviceName = "clusters"
  AND actionName = "permanentDelete"
  AND date >= "<start-date>"  #Date is in yyyy-MM-dd format
  AND date <="<end-date>"  #Date is in yyyy-MM-dd format


Delete

Info

If your queries do not returns any results for a cluster, it means that the cluster configuration was unpinned and it was automatically deleted more than 30 days ago.