Problem
When you execute a Delta Lake UPDATE, DELETE, or MERGE query that uses Python UDFs in any of its transformations, it fails with the following exception:
AWS
java.lang.UnsupportedOperationException: Error in SQL statement: IllegalStateException: File (s3a://xxx/table1) to be rewritten not found among candidate files: s3a://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquetDelete
Azure
java.lang.UnsupportedOperationException: Error in SQL statement: IllegalStateException: File (adl://xxx/table1) to be rewritten not found among candidate files: adl://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquetDelete
Version
This problem occurs on Databricks Runtime 5.5 and below.
Cause
Delta Lake internally depends on the input_file_name() function for operations like UPDATE, DELETE, and MERGE. input_file_name() returns an empty value if you use it in a SELECT statement that evaluates a Python UDF. UPDATE calls SELECT internally, which then fails to return file names and leads to the error. This error does not occur with Scala UDFs.
Solution
You have two options:
- Use Databricks Runtime 6.0 or above, which includes the resolution to this issue: [SPARK-28153].
- If you can’t use Databricks Runtime 6.0 or above, use Scala UDFs instead of Python UDFs.