Ganglia metrics not appearing in historical metrics snapshots list

Configure the collection period, or update Databricks Runtime to at least 13.0 to use Compute metrics instead.

Written by david.vega

Last published at: September 27th, 2024

Problem

When you navigate to your Metrics tab to access the Ganglia metrics for a cluster, you notice you have zero files in your Historical metrics snapshots list. Within the file list, you see a message, “No metrics found.” 

Cause

Ganglia snapshots are taken every 15 minutes. If the cluster runs for fewer minutes, and outside the snapshot-taken frame, Ganglia metrics will not be collected. 

Note

Ganglia metrics are only available for Databricks Runtime 12.2 LTS and below for AWS and Azure. Ganglia is not supported on Databricks on Google Cloud.

 

Solution

Databricks recommends using the Compute metrics feature introduced as of Databricks Runtime 13.0, where metrics are collected every minute.

For more information, please review the Manage compute (AWSAzuredocumentation. 

The new compute metrics UI has a more comprehensive view of your cluster’s resource usage, including Spark consumption and internal Databricks processes. In contrast, the Ganglia UI only measures Spark container consumption. For additional information about how to use these new metrics please refer to the View compute metrics (AWSAzuredocumentation. 

If you need to continue using Databricks Runtime 9.1 LTS - 12.2 LTS, configure the collection period using the cluster UI: 

  1. Set the DATABRICKS_GANGLIA_SNAPSHOT_PERIOD_MINUTES environment variable.  Advance OptionsSpark Environment Variables. 
  2. Add the required time.  DATABRICKS_GANGLIA_SNAPSHOT_PERIOD_MINUTES=5 

 

Alternatively, use the API to set the required time using the spark_env_vars field. For more information, please review the Clusters API documentation.