Need to track DBU consumption per cluster and present clusters in the workspace

Use system tables to query DBU consumption.

Last published at: January 14th, 2025

Problem

You want to query the system.compute.clusters and system.billing.usage tables to gain insights into DBU consumption per cluster and identify all-purpose and job clusters within your workspace.

Cause

You may want these insights for multiple reasons, including old deleted clusters, potential field variations, and SQL query adjustments.

Deleted clusters

Clusters deleted before specific dates may not be included in the query results.

Field variations

Differences in the change_time and delete_time fields could sometimes cause cluster information to appear differently than expected.

SQL query adjustments

Queries may need to incorporate recent changes or deletions in cluster data to provide more accurate results.

Solution

These steps can be implemented to achieve more refined results by utilizing multiple system tables.

Enable required system schemas

While the system.billing schema is enabled by default, the system.compute schema needs to be enabled manually.
For more information, please review the Enable system table schemas (AWS | Azure | GCP) documentation.

Refine SQL queries

Adjust queries to include recent cluster changes and take deleted clusters into account for more accurate results.
Applying filters and joins using fields like change_time and delete_time can help provide a comprehensive view of cluster states.

Ensure accurate results from the `system.compute.clusters` table

Ensure your SQL query correctly accounts for the latest changes and deletions in cluster data.

Example code

In this example code block, we are fetching the clusters present in the workspace created via UI and API.

SELECT DISTINCT 
    c.workspace_id, 
    c.cluster_id, 
    c.cluster_name, 
    MAX(c.create_time) AS first_created, 
    MAX(c.change_time) AS last_changed, 
    c.driver_node_type, 
    c.worker_node_type, 
    c.min_autoscale_workers, 
    c.max_autoscale_workers, 
    c.auto_termination_minutes, 
    c.dbr_version, 
    c.cluster_source
FROM 
    system.compute.clusters c
INNER JOIN (
    SELECT 
        cluster_id, 
        MAX(change_time) AS max_change_time 
    FROM 
        system.compute.clusters 
    GROUP BY 
        cluster_id
) m 
    ON c.cluster_id = m.cluster_id 
    AND c.change_time = m.max_change_time
WHERE 
    c.delete_time IS NULL 
    AND c.cluster_source IN ('UI', 'API') 
    AND c.workspace_id = <enter-your-workspace-id>
GROUP BY 
    c.workspace_id, 
    c.cluster_id, 
    c.cluster_name, 
    c.driver_node_type, 
    c.worker_node_type, 
    c.min_autoscale_workers, 
    c.max_autoscale_workers, 
    c.auto_termination_minutes, 
    c.dbr_version, 
    c.cluster_source
ORDER BY 
    c.workspace_id, 
    c.cluster_id;

Retrieve DBU consumption per cluster.

Example code

In this example code block, we are fetching the DBUs consumed per cluster on a monthly basis.

SELECT 
    date_format(u.usage_date, 'yyyy-MM') AS `Month`, 
    c.cluster_name AS `Cluster Name`, 
    c.cluster_id AS `Cluster ID`, 
    SUM(u.usage_quantity) AS `Total DBUs Consumed`
FROM 
    system.billing.usage u
INNER JOIN 
    system.compute.clusters c 
    ON u.usage_metadata.cluster_id = c.cluster_id
WHERE 
    c.cluster_name LIKE <cluster-name>
    AND c.cluster_id LIKE <cluster-id>
GROUP BY 
    date_format(u.usage_date, 'yyyy-MM'), 
    c.cluster_name, 
    c.cluster_id
ORDER BY 
    `Month` ASC, 
    `Cluster Name` ASC;

Verify that the delete_time field is correctly handled in your queries to exclude deleted clusters.

SELECT * FROM system.compute.clusters WHERE delete_time IS NULL;

Verify the sample queries with known data to ensure they return the expected results. If needed, customize the examples based on your specific workspace and cluster configurations.

For more information regarding the system.compute.clusters table, please review the Compute system tables reference (AWS | Azure | GCP) documentation.

Databricks Help Center

Problem

Cause

Deleted clusters

Field variations

SQL query adjustments

Solution

Enable required system schemas

Refine SQL queries

Ensure accurate results from the system.compute.clusters table

Example code

Example code

Contact Us

Ensure accurate results from the `system.compute.clusters` table