PySparkValueError when working with UDFs in Apache Spark

Ensure that the Python UDF output matches the schema defined in the source code.

Written by raphael.balogo

Last published at: January 16th, 2025

Problem

When working with user-defined functions (UDFs) in Apache Spark, you encounter the following error. 

 

pyspark.errors.exceptions.base.PySparkValueError: [UNEXPECTED_TUPLE_WITH_STRUCT] Unexpected tuple {<your-object-value>} with StructType.

 

Example of problem-creating code 

The following code demonstrates a UDF where the returned object value is not StructType, which leads to the error. 

 

%python

from pyspark.sql.functions import udf
from pyspark.sql.types import StructType

# Define a UDF, which returns your object value instead of StructType.
def faulty_udf(value):
    return  {<your-object-value>}   

# Register the UDF with StructType, which does not match the object value output above.
faulty_udf_spark = udf(faulty_udf, StructType())

data = [(1,), (2,), (3,)]
df = spark.createDataFrame(data, ["input"])
df_with_faulty_udf = df.withColumn("output", faulty_udf_spark(df["input"]))
df_with_faulty_udf.show()

 

Cause

You’re using an invalid individual output row type. 

 

Solution

Ensure your UDF’s runtime output values use a schema that matches the schema defined in your source code. Any return parameter defined in the UDF that does not match the source code schema will throw the error.