High-cached memory in clusters but no active applications submitted

Clear cache, restart cluster, and add executors to manage memory.

Written by chandana.koppal

Last published at: August 30th, 2024

Problem

You notice an unexpectedly high-cached memory in your clusters but no active applications submitted. This issue becomes apparent when you observe significant cached data in your Metrics tab.

Cause

Running multiple Apache Spark streaming applications or notebooks on the cluster can lead to memory overload on an interactive cluster in limited cluster memory. Even after these jobs are completed, the data cached in memory may persist, contributing to elevated memory usage. This is especially true if the cluster restart window is prolonged. 

You may also have idle notebooks attached to the cluster.

Solution

Clear the cache to free up memory. In Spark, this can be done using the unpersist() method. For example, if df is a cached Spark DataFrame, users can run df.unpersist()

To bring the cluster back to zero RAM usage, terminate the cluster and clear execution context. This will stop all processes and free up the memory. 

You can also increase the number of executors to help manage memory load. For more information, please refer to Apache Spark executor memory allocation. 

Generally, Databricks recommends restarting the cluster often for regular clean-up, especially if it is an interactive cluster.