DLT pipeline is failing with exception DELTA_MERGE_MATERIALIZE_SOURCE_FAILED_REPEATEDLY

Switch from spot instances to on-demand instances.

Written by nikhil.jain

Last published at: August 19th, 2025

Problem

You notice your DLT pipeline begins failing with the following error.

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = <id>, runId = <run-id>] terminated with exception: [DELTA_MERGE_MATERIALIZE_SOURCE_FAILED_REPEATEDLY] Keeping the source of the MERGE statement materialized has failed repeatedly. SQLSTATE: XXKST

 

Cause

Your pipeline is trying to use spot instances which have been terminated, resulting in the loss of nodes and pipeline failure.

 

To confirm this cause, navigate to your cluster. Click View details > Event log, and filter the events with NODES_LOST

 

Then click the Spark UI tab to check for the following error message. 

[CHECKPOINT_RDD_BLOCK_ID_NOT_FOUND] Checkpoint block rdd_XXXX not found!
Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted.
If this problem persists, you may consider using `rdd.checkpoint()` instead, which is slower than local checkpointing but more fault-tolerant. SQLSTATE: 56000

 

Solution

Switch to using on-demand instances. In DLT, add the following code snippet to the SettingsJSON settings of your pipeline. 

{
 "clusters": [
  {
   "label": "default",
   "aws_attributes": {
        "availability": "ON_DEMAND"
   },
“...”:”…”  }
 ]
}

 

If you want to continue to use spot instances, enable Apache Spark decommissioning to decrease the chance of data loss.

For more information, review the “Decommission spot instances” section of the Manage compute (AWSAzureGCP) documentation.