Structured Streaming does not process batch size reduction after a failed transaction

Delete the uncommitted offset file from the checkpoint location and restart the stream.

Written by Tarun Sanjeev

Last published at: July 1st, 2025

Problem

You are attempting to adjust the maxFilesPerTrigger or maxBytesPerTrigger settings to control the amount of data processed in each Structured Streaming micro-batch. However, after a failed transaction, these changes do not take effect. The stream continues to use the previous batch settings, ignoring any new configurations.

 

Cause

When a micro-batch fails, the system has already created and stored offset information in the checkpoint directory. These offset files are not automatically overwritten when the stream restarts or when configurations are changed.

 

The problem occurs because the offset files created during a failed micro-batch remain in the checkpoint directory. Changes to configurations like maxFilesPerTrigger or maxBytesPerTrigger are only applied to new offset calculations. The stream continues to use the existing offset information from the failed batch, ignoring the updated configurations.

 

This behavior leads to a situation where the stream doesn't immediately adapt to the new rate limiting settings, instead still using the offset information from the previous, failed attempt. The new configurations only take effect for completely new micro-batches, not for retries of failed batches.

 

Solution

  1. Run the command %fs ls <path-to-checkpoint-location> in a notebook to list the files in the table checkpoint location.
  2. Identify the offset file that does not have a corresponding commits file in the checkpoint location. 
    1. Locate the latest file inside the offset folder and obtain its name. 
    2. Ensure there is no corresponding file with that name in the commits file folder.
  3. Make a backup of this offset file and store it in an external location so you can return to a known state if needed.
  4. Delete the offset file from its original location in the checkpoint path.
  5. Restart the stream.

 

This process ensures that the stream picks up the new configuration and resolves the batch size reduction issue. 

 

Additional considerations

  • Monitor the checkpoint directory for any discrepancies between the offset and commit files. 
  • Implement automated scripts to check for and resolve such mismatches proactively.