Problem
You wish to track the usage history (select queries) for specific tables in Databricks. You may encounter difficulties in identifying which users have executed SELECT
commands against arbitrary schemas/tables.
Cause
You may not be familiar with writing the SQL queries needed to access system tables information.
Solution
The audit logs can be used to identify users who have performed getTable
actions (i.e. SELECT
commands) against the specified tables. The audit logs are stored in the system.access.audit
table, which can be queried using Apache Spark SQL.
Add the desired date range to the query event_date
parameter and include the full path to the schemas you wish to audit (i.e. catalog.schema
).
%sql
SELECT
event_time,
event_date,
user_identity.email AS user_email,
service_name,
action_name,
request_params,
source_ip_address,
user_agent
FROM system.access.audit
WHERE action_name = 'getTable'
AND (
request_params['full_name_arg'] LIKE '%<add-schema1-name-here>%'
OR request_params['full_name_arg'] LIKE '%<add-schema2-name-here>%'
OR request_params['full_name_arg'] LIKE '%<add-schema3-name-here>%'
)
AND event_date BETWEEN '<start-date-as-yyyy-mm-dd>' AND '<end-date-as-yyyy-mm-dd>'
ORDER BY event_time DESC;
The query output contains all tables that have been selected which are part of the schemas added. The request_params
field contains the table name, and user_identity
is the user that performed the action. If there is no output, it means no users have selected the tables under the schemas you included in the query.