Problem
You start a streaming job from a data source using Structured Streaming (or Delta Live Tables). You then modify the query to including an additional streaming source using UNION
.
Your job fails with the following error.
Job terminated with exception: assertion failed: There are [X] sources in the checkpoint offsets, and now there are [X+Y] sources requested by the query. Cannot continue. SQLSTATE: XXKST
Cause
The number of streaming sources in the query and the number of sources recorded in the existing checkpoint metadata do not match.
When you create a stream with a single source, a checkpoint is established for this source. Structured Streaming (including Delta Live Tables) uses checkpointing to track progress. Each source's metadata (for example, file offsets or Kafka partitions) is stored in the checkpoint.
If you modify the query to add another streaming source after starting the job, the query now has two sources, but the initial checkpoint only knows about the first source.
Apache Spark detects the inconsistency and throws the Job terminated
error. Spark does not allow the number or type of sources to change across restarts when using the same checkpoint.
Solution
For Spark Streaming, start with a fresh checkpoint when you need to make changes. If you are using a Delta Live Table (DLT) pipeline, perform a full refresh of the pipeline.
For more information, refer to the Run an update in Lakeflow Declarative Pipelines (AWS | Azure | GCP) documentation.