Streaming job fails with a DELTA_MERGE_MATERIALIZE_SOURCE_FAILED_REPEATEDLY error

Disable source data materialization during MERGE statement optimization.

Written by mounika.tarigopula

Last published at: November 17th, 2024

Problem

Your Apache Spark structured streaming job fails with the following error. 

 

com.databricks.sql.transaction.tahoe.DeltaRuntime Exception: [DELTA_MERGE_MATERIALIZE_SOURCE_FAILED_REPEATEDLY] Keeping the source of the MERGE statement materialized has failed repeatedly.

 

Cause

Delta Engine tries and fails to optimize a MERGE statement by materializing the source data in memory. The source data are too large to fit in memory, or there are issues with the underlying storage system.

 

Solution

Disable source data materialization during MERGE statement optimization. This way, Delta Engine still optimizes the MERGE statement but relies on the regular execution plan to process the data.  

 

spark.databricks.delta.merge.materializeSource None