Problem
You use a Unity Catalog (UC) standard (formerly shared) access mode cluster to execute a command similar to the following example.
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
df.toJSON().first()
Upon execution, you get the error [NOT_IMPLEMENTED] toJSON() is not implemented
. The following screenshot shows the error in the notebook UI.
Cause
toJSON()
is not implemented in UC standard mode clusters for security reasons.
Specifically, toJSON()
converts a DataFrame into an RDD of a string, and RDD APIs are not supported in UC standard mode clusters.
Solution
Use a dedicated (formerly single-user) access mode cluster instead.
Alternatively, you can use to_json
, which returns a JSON string with the STRUCT or VARIANT specified in the expression.
For more information, refer to the to_json
function (AWS | Azure | GCP) documentation.