Getting a ConcurrentModificationException in PySpark CrossValidator on Databricks Runtime 15.4 LTS

Enable spark.databricks.property.standardClone or upgrade your Databricks Runtime version.

Last published at: July 18th, 2025

Problem

When using Databricks Runtime 15.4 ML and running CrossValidator from pyspark.ml with a large number of hyperparameter combinations, you encounter an error. The following code shows the full error stack trace.

Py4JJavaError: An error occurred while calling o6242.evaluate.
: java.util.ConcurrentModificationException
	at java.util.Hashtable$Enumerator.next(Hashtable.java:1408)
	at java.util.Hashtable.putAll(Hashtable.java:523)
	at org.apache.spark.util.Utils$.cloneProperties(Utils.scala:3474)
	at org.apache.spark.SparkContext.getCredentialResolvedProperties(SparkContext.scala:523)
	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3157)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1104)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:454)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1102)
	at org.apache.spark.mllib.evaluation.AreaUnderCurve$.of(AreaUnderCurve.scala:44)
	at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:127)
	at org.apache.spark.ml.evaluation.BinaryClassificationEvaluator.evaluate(BinaryClassificationEvaluator.scala:101)
	at sun.reflect.GeneratedMethodAccessor323.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)

Cause

This error typically occurs when the estimatorParamMaps in CrossValidator contains a high number of parameter combinations (such as 60). The following code provides an example.

param_grid = ParamGridBuilder() \
    .addGrid(lr_model.elasticNetParam, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) \
    .addGrid(lr_model.regParam, [0, 0.001, 0.01, 0.1, 1.0, 10.0]) \
    .build()

print("Num combinations:", len(param_grid))

# Create a binary classification evaluator
evaluator = BinaryClassificationEvaluator(rawPredictionCol="rawPrediction",
                                         labelCol="label",
                                         metricName="areaUnderROC")

# Create a cross-validator for hyperparameter tuning
cv = CrossValidator(estimator=lr_model, estimatorParamMaps=param_grid,
                    evaluator=evaluator, numFolds=5, parallelism=32)

With higher parameter combinations, you increase concurrency, which triggers the issue more often. The increased concurrency includes internal concurrent property modifications (manual clones).

In Databricks Runtime 15.4 LTS, you can encounter a rare race condition in CrossValidator. This condition leads to the ConcurrentModificationException which happens during that manual cloning, specifically of Apache Spark properties in org.apache.spark.util.Utils.cloneProperties.

Solution

Enable the following Spark configuration on your cluster. This config enables the standard Properties.clone instead of manual clone.

spark.databricks.property.standardClone.enabled true

For details on how to apply Spark configs, refer to the “Spark configuration” section of the Compute configuration reference (AWS | Azure | GCP) documentation.

Alternatively, you can upgrade to a Databricks Runtime version above 15.4 LTS.

Databricks Help Center

Problem

Cause

Solution

Contact Us