Fetching the last access date/time of tables across all workspaces runs slowly and inefficiently

Databricks recommends using audit log system tables.

Written by caio.cominato

Last published at: December 12th, 2024

Problem

You need to retrieve the last access date/time of a table across all workspaces, but the per-table approach using SHOW TABLE EXTENDED command, or checking the insights tab, seems slow and inefficient. 

 

Cause

SHOW TABLE EXTENDED queries each table one at a time, one workspace at a time, causing a surge of requests to hit your metastore. 

 

Solution

Databricks recommends using audit log system tables. 

  1. First, check that you have the necessary permissions to access audit logs. You can grant permissions using the Databricks SQL query GRANT SELECT ON system.access.audit TO <username>.
  2. Run the following SQL query to retrieve the last access date/time of a table. Replace <your-database-name> and <your-table-name> with your respective database and table names.

 

SELECT
action_name as `EVENT`,
event_time as `WHEN`,
request_params, user_identity.email,
IFNULL(request_params.full_name_arg, 'Non-specific') AS `TABLE ACCESSED`,
IFNULL(request_params.commandText,'GET table') AS `QUERY TEXT`
FROM system.access.audit
WHERE request_params.full_name_arg = 'catalog_<your-database-name>.<your-table-name>'
AND action_name IN ('createTable', 'commandSubmit','getTable','deleteTable')
order by event_time DESC

 

Note

Audit logs maintain information for all workspaces in your account for the same cloud region. If the workspace is available in a different region, query separately from that region.

For more information, refer to the Audit log system table reference (AWSAzureGCP) documentation.