Structured Streaming job fails with a Streaming Query Exception when a schema changes in the source table

Enable schema tracking and set allowSourceColumnRenameAndDrop to true.

Written by shanmugavel.chandrakasu

Last published at: December 2nd, 2024

Problem

You have a streaming job that is ingesting data from a Delta table. Some columns in the Delta table may have been renamed or dropped (schema evolution) and you get a StreamingQueryException: [STREAM_FAILED] error message.

 

StreamingQueryException: [STREAM_FAILED] Query [id = XXX, runId = XXXX] terminated with exception: The schema, table configuration or protocol of your Delta table has changed during streaming.The schema or metadata tracking log has been updated.Please restart the stream to continue processing using the updated metadata.

 

Cause

If you add, drop, or rename any column in the source table, the streaming job fails.

 

Solution

Update the required schema definition either at source or target and restart the streaming query to continue processing the job.

  1. For non additive schema changes such as rename or dropping columns, enable schema tracking. For the scenario to work, the schema must be specified, and each streaming read against a data source must have its own schemaTrackingLocation specified. For more information, review the Rename and drop columns with Delta Lake column mapping (AWSAzureGCP) documentation. This ensures that schema changes are properly tracked.
  2. Set spark.databricks.delta.streaming.allowSourceColumnRenameAndDrop to true.
  3. Restart the streaming query.

 

Note

This is supported in Databricks Runtime 13.3 LTS and above. If your workflow has non-additive schema changes such as renaming or dropping columns, this configuration is a good choice. Otherwise this configuration is not needed.