Delta Lake UPDATE
query fails with IllegalState
exception
Problem
When you execute a Delta Lake UPDATE
, DELETE
, or MERGE
query that uses Python UDFs in any of its transformations, it fails with the following exception:
java.lang.UnsupportedOperationException: Error in SQL statement:
IllegalStateException: File (s3a://xxx/table1) to be rewritten not found among candidate files:
s3a://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquet
Cause
Delta Lake internally depends on the input_file_name()
function for operations like UPDATE
, DELETE
, and MERGE
. input_file_name()
returns an empty value if you use it in a SELECT
statement that evaluates a Python UDF. UPDATE
calls SELECT
internally, which then fails to return file names and leads to the error. This error does not occur with Scala UDFs.
Solution
You have two options:
- Use Databricks Runtime 6.0 or above, which includes the resolution to this issue: [SPARK-28153].
- If you can’t use Databricks Runtime 6.0 or above, use Scala UDFs instead of Python UDFs.