Problem
When working with user-defined functions (UDFs) in Apache Spark, you encounter the following error.
pyspark.errors.exceptions.base.PySparkValueError: [UNEXPECTED_TUPLE_WITH_STRUCT] Unexpected tuple {<your-object-value>} with StructType.
Example of problem-creating code
The following code demonstrates a UDF where the returned object value is not StructType
, which leads to the error.
%python
from pyspark.sql.functions import udf
from pyspark.sql.types import StructType
# Define a UDF, which returns your object value instead of StructType.
def faulty_udf(value):
return {<your-object-value>}
# Register the UDF with StructType, which does not match the object value output above.
faulty_udf_spark = udf(faulty_udf, StructType())
data = [(1,), (2,), (3,)]
df = spark.createDataFrame(data, ["input"])
df_with_faulty_udf = df.withColumn("output", faulty_udf_spark(df["input"]))
df_with_faulty_udf.show()
Cause
You’re using an invalid individual output row type.
Solution
Ensure your UDF’s runtime output values use a schema that matches the schema defined in your source code. Any return parameter defined in the UDF that does not match the source code schema will throw the error.