Problem: Delta Lake UPDATE Query Fails with IllegalState Exception

Problem

When you execute a Delta Lake UPDATE, DELETE, or MERGE query that uses Python UDFs in any of its transformations, it fails with the following exception:

java.lang.UnsupportedOperationException: Error in SQL statement:
IllegalStateException: File (s3a://xxx/table1) to be rewritten not found among candidate files:
s3a://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquet

Version

This problem occurs on Databricks Runtime 5.5 and below.

Cause

Delta Lake internally depends on the input_file_name() function for operations like UPDATE, DELETE, and MERGE. input_file_name() returns an empty value if you use it in a SELECT statement that evaluates a Python UDF. UPDATE calls SELECT internally, which then fails to return file names and leads to the error. This error does not occur with Scala UDFs.

Solution

You have two options:

  • Use Databricks Runtime 6.0 or above, which includes the resolution to this issue: [SPARK-28153].
  • If you can’t use Databricks Runtime 6.0 or above, use Scala UDFs instead of Python UDFs.