Delta Lake UPDATE query fails with IllegalState exception

Learn how to resolve an issue with Delta Lake UPDATE, DELETE, or MERGE queries that use Python UDFs.

Written by Adam Pavlacka

Last published at: May 10th, 2022

Problem

When you execute a Delta Lake UPDATE, DELETE, or MERGE query that uses Python UDFs in any of its transformations, it fails with the following exception:

AWS

java.lang.UnsupportedOperationException: Error in SQL statement:
IllegalStateException: File (s3a://xxx/table1) to be rewritten not found among candidate files:
s3a://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquet
Delete

Azure

java.lang.UnsupportedOperationException: Error in SQL statement:
IllegalStateException: File (adl://xxx/table1) to be rewritten not found among candidate files:
adl://xxx/table1/part-00001-39cae1bb-9406-49d2-99fb-8c865516fbaa-c000.snappy.parquet
Delete

Version

This problem occurs on Databricks Runtime 5.5 and below.

Cause

Delta Lake internally depends on the input_file_name() function for operations like UPDATE, DELETE, and MERGE. input_file_name() returns an empty value if you use it in a SELECT statement that evaluates a Python UDF. UPDATE calls SELECT internally, which then fails to return file names and leads to the error. This error does not occur with Scala UDFs.

Solution

You have two options:

  • Use Databricks Runtime 6.0 or above, which includes the resolution to this issue: [SPARK-28153].
  • If you can’t use Databricks Runtime 6.0 or above, use Scala UDFs instead of Python UDFs.