Problem
Your Databricks SQL warehouse continues running even when no new queries are submitted. You may have a queue of low-cost queries built up, impacting performance. You want to identify any long-running queries to prevent resource wastage and optimize query execution.
Cause
Long-running queries keep a warehouse in use for an extended period of time.
A query may run for an extended time due to:
- Complex joins, large datasets, or inefficient execution plans.
- High workload demand on the warehouse.
- Insufficient compute resources, leading to slower processing.
- Queries stuck in the queue, delaying execution.
Solution
There are multiple ways to identify long-running queries.
Query history UI
In the workspace, on the left side panel under SQL click Query History.
You can filter queries by calendar time, Compute (by warehouse), Duration, and Status (running, finished, etc.).
Info
The Peak query count chart, which appears in the Monitoring tab of a selected SQL warehouse, is often misinterpreted as showing all running queries. However, it only highlights queries with a non-trivial load, affecting autoscaling behavior. For a complete view of all running queries, use the Query History UI, the Databricks API, or system tables.
Query history API
Use the List Queries (AWS | Azure | GCP) API to fetch long-running queries dynamically.
The example script helps identify queries that have exceeded 2 minutes of total execution time. It filters queries based on the specified warehouse and time period while also providing the total count of queries that surpass the duration threshold.
Example code
%sql
import requests
import json
import time
# Databricks Configuration
DATABRICKS_INSTANCE = "https://<workspace-instance-url>"
API_TOKEN = "<personal-access-token>"
API_URL = f"{DATABRICKS_INSTANCE}/api/2.0/sql/history/queries"
# Convert date/time string to milliseconds
def to_milliseconds(date_string, format="%Y-%m-%d %H:%M:%S"):
return int(time.mktime(time.strptime(date_string, format)) * 1000)
# Define the time range (modify as needed)
start_time_str = "<start-time-for-query-search-window>"
end_time_str = "<end-time-of-the-search-window>"
# Convert to milliseconds
start_time_ms = to_milliseconds(start_time_str)
end_time_ms = to_milliseconds(end_time_str)
# API Request Body
payload = {
"filter_by": {
"statuses": ["FINISHED", "RUNNING"], # Example: Filter by query statuses
"warehouse_ids": ["<warehouse-id>"], # List of warehouse IDs
"query_start_time_range": {
"start_time_ms": start_time_ms,
"end_time_ms": end_time_ms
}
}
}
# Headers
headers = {
"Authorization": f"Bearer {API_TOKEN}",
"Content-Type": "application/json"
}
# Make the API request
response = requests.get(API_URL, headers=headers, json=payload)
# Handle response
if response.status_code == 200:
data = response.json()
queries = data.get("res", [])
# Filter queries where duration > 120000 ms (2 minutes)
long_queries = [q for q in queries if q.get("duration", 0) > 120000]
# Print the filtered queries
print("\nQueries with duration > 2 minutes:\n")
for query in long_queries:
print(json.dumps(query, indent=4))
print(f"\nTotal Queries with duration > 2 minutes: {len(long_queries)}")
else:
print(f"Error: {response.status_code} - {response.text}")
Info
For proactive monitoring, schedule a job to call the query history API regularly and set up alerts based on the output. This can help you detect and act on long-running queries before they impact performance or cost.
System table query history
You can retrieve information on long-running queries from the Query history system table (AWS | Azure | GCP).
The example SQL query retrieves long-running queries that have already finished or failed. It filters queries based on execution duration, warehouse, and time period.
Example code
%sql
SELECT
statement_id,
statement_text,
executed_by,
execution_status,
start_time,
end_time,
total_duration_ms,
execution_duration_ms,
compilation_duration_ms,
result_fetch_duration_ms,
compute.warehouse_id
FROM system.query_history
WHERE
total_duration_ms > 2 * 60 * 1000 -- Queries running longer than 2 minutes
AND start_time >= TIMESTAMPADD(DAY, -8, CURRENT_TIMESTAMP) -- Queries from the last 8 days
AND (compute.warehouse_id = '<warehouse-id>') -- Filter by warehouse ID (modify as needed)
AND execution_status IN ('RUNNING', 'FINISHED') -- Running or completed queries
ORDER BY total_duration_ms DESC; -- Longest-running queries first