When you execute a Delta Lake
MERGE query that uses Python UDFs in any of its transformations, it fails with the following exception:
java.lang.UnsupportedOperationException: Error in SQL statement: IllegalStateException: File (s3a://xxx/table1) to be rewritten not found among candidate files: s3a://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquet
Delta Lake internally depends on the
input_file_name() function for operations like
input_file_name() returns an empty value if you use it in a
SELECT statement that evaluates a Python UDF.
SELECT internally, which then fails to return file names and leads to the error. This error does not occur with Scala UDFs.
You have two options:
- Use Databricks Runtime 6.0 or above, which includes the resolution to this issue: [SPARK-28153].
- If you can’t use Databricks Runtime 6.0 or above, use Scala UDFs instead of Python UDFs.