Custom garbage collection prevents cluster launch

Using a custom garbage collection algorithm on Databricks Runtime 10.0 and above prevents the cluster from starting.

Written by harikrishnan.kunhumveettil

Last published at: December 8th, 2022

Problem

You are trying to use a custom Apache Spark garbage collection algorithm (other than the default one (parallel garbage collection) on clusters running Databricks Runtime 10.0 and above. When you try to start a cluster, it fails to start. If the configuration is set on an executor, the executor is immediately terminated.

For example, if you set either of the following custom garbage collection algorithms in your Spark config, the cluster creation fails.

Spark driver

spark.driver.extraJavaOptions  -XX:+UseG1GC

Spark executor

spark.executor.extraJavaOptions -XX:+UseG1GC

Cause

A new Java virtual machine (JVM) flag was introduced to set the garbage collection algorithm to parallel garbage collection. If you do not change the default, the change has no impact.

If you change the garbage collection algorithm by setting spark.executor.extraJavaOptions or spark.driver.extraJavaOptions in your Spark config, the value conflicts with the new flag. As a result, the JVM crashes and prevents the cluster from starting.

Solution

To work around this issue, you must explicitly remove the parallel garbage collection flag in your Spark config. This must be done at the cluster level.

spark.driver.extraJavaOptions -XX:-UseParallelGC -XX:+UseG1GC
spark.executor.extraJavaOptions -XX:-UseParallelGC -XX:+UseG1GC