Problem
When using a legacy Apache Spark user-defined function (UDF) to create a complex prompt and pass it dynamically to the ai_query()
function, you receive an error. However, if you don’t use the UDF, the ai_query()
function works.
Example prompt
In the following code, legacy spark udf create_prompt_udf()
and python udf ai_query()
are called on the same transformation.
result_df = df.withColumn(
"prompt",
create_prompt_udf(
F.col("question"),
F.col("topic"),
F.col("category")
)
).withColumn(
"answer",
F.expr("""
ai_query(
'databricks-meta-llama-3-1-70b-instruct',
prompt
)
""")
Error message
org.apache.spark.SparkException: [INTERNAL_ERROR] Expected udfs have the same evalType but got different evalTypes: 100,400 SQLSTATE: XX000
Cause
Legacy Spark UDFs and ai_query()
UDFs have different processing which creates the error.
Legacy Spark UDFs have evalTypes: 100
which is itself a Spark UDF where data is processed row by row. The ai_query()
UDF has evalTypes: 400
which is a Python function that triggers a Python runner internally and uses batch processing.
Solution
Use Unity Catalog (UC) UDFs instead of legacy Spark UDFs. UC UDFs are designed to be compatible with Spark functions like ai_query()
. UC UDFs can handle the same batch processing method as the ai_query()
UDF, ensuring that there is no conflict in the evaluation types.
For more information, refer to the User-defined functions (UDFs) in Unity Catalog (AWS | Azure | GCP) documentation.