SparkException error when trying to use an Apache Spark UDF to create and dynamically pass a prompt to the ai_query() function

Use Unity Catalog (UC) UDFs instead of Spark UDFs.

Written by vinay.mr

Last published at: January 30th, 2025

Problem

When using a legacy Apache Spark user-defined function (UDF) to create a complex prompt and pass it dynamically to the ai_query() function, you receive an error. However, if you don’t use the UDF, the ai_query() function works. 

 

Example prompt

In the following code, legacy spark udf create_prompt_udf() and python udf ai_query() are called on the same transformation. 

 

result_df = df.withColumn(
"prompt",
create_prompt_udf(
F.col("question"),
F.col("topic"),
F.col("category")
)
).withColumn(
"answer",
F.expr("""
ai_query(
'databricks-meta-llama-3-1-70b-instruct',
prompt
)
""")

 

Error message

org.apache.spark.SparkException: [INTERNAL_ERROR] Expected udfs have the same evalType but got different evalTypes: 100,400 SQLSTATE: XX000

 

Cause

Legacy Spark UDFs and ai_query() UDFs have different processing which creates the error.  

 

Legacy Spark UDFs have evalTypes: 100 which is itself a Spark UDF where data is processed row by row. The ai_query() UDF has evalTypes: 400 which is a Python function that triggers a Python runner internally and uses batch processing.

 

Solution

Use Unity Catalog (UC) UDFs instead of legacy Spark UDFs. UC UDFs are designed to be compatible with Spark functions like ai_query(). UC UDFs can handle the same batch processing method as the ai_query() UDF, ensuring that there is no conflict in the evaluation types.

 

For more information, refer to the User-defined functions (UDFs) in Unity Catalog (AWSAzureGCP) documentation.