Error when trying to use RDD code in shared clusters

Use a single-user cluster, which supports RDD functionality.

Written by mounika.tarigopula

Last published at: January 31st, 2025

Problem

When trying to use Resilient Distributed Dataset (RDD) code in a shared cluster, you receive an error.  

 

Error: Method public org.apache.spark.rdd.RDD
org.apache.spark.api.java.JavaRDD.rdd() is not allowlisted on class class org.apache.spark.api.java.JavaRDD

 

Cause

Databricks Runtime versions with Unity Catalog enabled do not support RDDs on shared clusters.  

 

Solution

Use a single-user cluster instead, which supports RDD functionality.

If you want to continue using a shared cluster, use the DataFrame API instead of the RDD API. For example, you can use spark.createDataFrame to create DataFrames.

For more information on creating DataFrames, refer to the Apache Spark pyspark.sql.SparkSession.createDataFrame documentation.