Problem
When trying to use Resilient Distributed Dataset (RDD) code in a shared cluster, you receive an error.
Error: Method public org.apache.spark.rdd.RDD
org.apache.spark.api.java.JavaRDD.rdd() is not allowlisted on class class org.apache.spark.api.java.JavaRDD
Cause
Databricks Runtime versions with Unity Catalog enabled do not support RDDs on shared clusters.
Solution
Use a single-user cluster instead, which supports RDD functionality.
If you want to continue using a shared cluster, use the DataFrame API instead of the RDD API. For example, you can use spark.createDataFrame
to create DataFrames.
For more information on creating DataFrames, refer to the Apache Spark pyspark.sql.SparkSession.createDataFrame documentation.