SparkException error when trying to use an Apache Spark UDF to create and dynamically pass a prompt to the ai_query() function

Use Unity Catalog (UC) UDFs instead of Spark UDFs.

Last published at: January 30th, 2025

Problem

When using a legacy Apache Spark user-defined function (UDF) to create a complex prompt and pass it dynamically to the ai_query() function, you receive an error. However, if you don’t use the UDF, the ai_query() function works.

Example prompt

In the following code, legacy spark udf create_prompt_udf() and python udf ai_query() are called on the same transformation.

result_df = df.withColumn(
"prompt",
create_prompt_udf(
F.col("question"),
F.col("topic"),
F.col("category")
)
).withColumn(
"answer",
F.expr("""
ai_query(
'databricks-meta-llama-3-1-70b-instruct',
prompt
)
""")

Error message

org.apache.spark.SparkException: [INTERNAL_ERROR] Expected udfs have the same evalType but got different evalTypes: 100,400 SQLSTATE: XX000

Cause

Legacy Spark UDFs and ai_query() UDFs have different processing which creates the error.

Legacy Spark UDFs have evalTypes: 100 which is itself a Spark UDF where data is processed row by row. The ai_query() UDF has evalTypes: 400 which is a Python function that triggers a Python runner internally and uses batch processing.

Solution

Use Unity Catalog (UC) UDFs instead of legacy Spark UDFs. UC UDFs are designed to be compatible with Spark functions like ai_query(). UC UDFs can handle the same batch processing method as the ai_query() UDF, ensuring that there is no conflict in the evaluation types.

For more information, refer to the User-defined functions (UDFs) in Unity Catalog (AWS | Azure | GCP) documentation.

Databricks Help Center

Problem

Example prompt

Error message

Cause

Solution

Contact Us