Job ID column not consistently showing values in the Apache Spark UI for Sub Execution IDs

Click on a Sub Execution ID column and then click again to sort and see all the IDs together.

Written by Raghavan Vaidhyaraman

Last published at: January 21st, 2025

Problem

When you’re reviewing your Apache Spark UI to optimize query performance, you notice a Sub Execution ID column in addition to the query ID column in the SQL/DataFrame tab. You also notice that the Job ID column doesn’t consistently show values in the UI, as shown in the following image.

 

 

Cause

A Job ID is generated for a Sub Execution ID only when an action, such as collect()count(), or saveAsTextFile() is triggered. If no such action is required (for example in certain transformations such as map()filter(), or simple metadata fetching), Spark will not create a new Job ID, leaving Sub Execution IDs with no corresponding job.

 

Context 

Sub Execution IDs in the Spark UI are identifiers for individual sub-parts of the queries executed in Spark. A query is divided into multiple Sub Execution IDs to enhance its execution speed. You can review the relationship between Job ID and Sub Execution Job IDs in the SQL/DataFrame tab.

 

Solution

To view all the Sub Execution IDs together in your Spark UI’s SQL/DataFrame tab:

  1. Click the Sub Execution IDs column, which opens in a new window.
  2. Click the Sub Execution IDs column again to sort the column and see all the IDs together.