Problem
When you navigate to your Metrics tab to access the Ganglia metrics for a cluster, you notice you have zero files in your Historical metrics snapshots list. Within the file list, you see a message, “No metrics found.”
Cause
Ganglia snapshots are taken every 15 minutes. If the cluster runs for fewer minutes, and outside the snapshot-taken frame, Ganglia metrics will not be collected.
Note
Ganglia metrics are only available for Databricks Runtime 12.2 LTS and below for AWS and Azure. Ganglia is not supported on Databricks on Google Cloud.
Solution
Databricks recommends using the Compute metrics feature introduced as of Databricks Runtime 13.0, where metrics are collected every minute.
For more information, please review the Manage compute (AWS | Azure) documentation.
The new compute metrics UI has a more comprehensive view of your cluster’s resource usage, including Spark consumption and internal Databricks processes. In contrast, the Ganglia UI only measures Spark container consumption. For additional information about how to use these new metrics please refer to the View compute metrics (AWS | Azure) documentation.
If you need to continue using Databricks Runtime 9.1 LTS - 12.2 LTS, configure the collection period using the cluster UI:
- Set the
DATABRICKS_GANGLIA_SNAPSHOT_PERIOD_MINUTES
environment variable. Advance Options > Spark > Environment Variables. - Add the required time.
DATABRICKS_GANGLIA_SNAPSHOT_PERIOD_MINUTES=5
Alternatively, use the API to set the required time using the spark_env_vars
field. For more information, please review the Clusters API documentation.