ClassNotFoundException error when executing a job or notebook with a custom Kryo serializer

Use an init script or use the spark.jars property in your Apache spark configuration.

Written by pavan.kumarchalamcharla

Last published at: March 20th, 2025

Problem

You build a JAR file defining a custom Kryo serializer and install the file in your cluster's libraries using the API or the UI. Then you add the spark.serializer org.apache.spark.serializer.KryoSerializer and spark.kryo.registrator <your-custom-kryo-class> Apache Spark properties to your cluster's configuration.

 

When you then try to execute a job or notebook, it fails with a ClassNotFoundException error.

 

Cause

When you install a custom library and define a custom serializer on a cluster, the library is loaded on the driver but not the executor upon starting.   

 

The library is made available for the executor to install, and the executor tries to pull/install the library from the driver instead when the first task that needs the library is triggered on the executor. 

 

Solution

There are two options available. The first option is to create an init script using the following steps. 

1. Instead of installing a library on the cluster using the configurations, upload the JAR file with the custom Kryo classes to your workspace file system or volume.

2. Create the below init script using the JAR file path from the previous step.

#!/bin/sh
cp <your-jar-file-path> /databricks/jars/    

3. Add the init script from the previous step to the cluster configurations under the Advanced options > Init Scripts tab.

4. Make sure the custom Kryo serializer configuration is still in place. In the same Advanced options space, click the Spark tab and verify your code is in the Spark config box. 

5. Restart the cluster. 

 

Alternatively, use the following property in the Spark configurations on your compute. 

spark.jars <your-jar-file-path>

 

Important

In this option, the JAR may fail to download if you do not have access to the location, or in cases of compute using legacy credential passthrough.