Apache Spark UI shows wrong number of jobs

Problem

You are reviewing the number of active Apache Spark jobs on a cluster in the Spark UI, but the number is too high to be accurate.

If you restart the cluster, the number of jobs shown in the Spark UI is correct at first, but over time it grows abnormally high.

Cause

The Spark UI is not always accurate for large, or long-running, clusters due to event drops. The Spark UI requires termination entries to know when an active job has completed. If a job misses this entry, due to errors or unexpected failure, the job may stop running while incorrectly showing as active in the Spark UI.

Solution

You should not use the Spark UI as a source of truth for active jobs on a cluster.

The method sc.statusTracker().getActiveJobIds() in the Spark API is a reliable way to track the number of active jobs.

Please review the Spark Status Tracker documentation for more information.