Streaming job has degraded performance

Streaming job has poor performance after stopping and restarting from same checkpoint.

Written by ashish

Last published at: May 11th, 2022

Problem

You have a streaming job which has its performance degrade over time.

You start a new streaming job with the same configuration and same source, and it performs better than the existing job.

Cause

Issues with old checkpoints can result in performance degradation in long running streaming jobs.

This can happen if the job was intermittently halted and restarted from the same checkpoint.

You can validate the issue by reviewing the latest micro batch offset sequence number.

Solution

  • Change the checkpoint directory.
  • Avoid restarting old streaming jobs with the same checkpoint directories.
  • If you cannot change the checkpoint directory, increase the cluster capacity.