Problem
When you’re reviewing your Apache Spark UI to optimize query performance, you notice a Sub Execution ID column in addition to the query ID column in the SQL/DataFrame tab. You also notice that the Job ID column doesn’t consistently show values in the UI, as shown in the following image.
Cause
A Job ID is generated for a Sub Execution ID only when an action, such as collect()
, count()
, or saveAsTextFile()
is triggered. If no such action is required (for example in certain transformations such as map()
, filter()
, or simple metadata fetching), Spark will not create a new Job ID, leaving Sub Execution IDs with no corresponding job.
Context
Sub Execution IDs in the Spark UI are identifiers for individual sub-parts of the queries executed in Spark. A query is divided into multiple Sub Execution IDs to enhance its execution speed. You can review the relationship between Job ID and Sub Execution Job IDs in the SQL/DataFrame tab.
Solution
To view all the Sub Execution IDs together in your Spark UI’s SQL/DataFrame tab:
- Click the Sub Execution IDs column, which opens in a new window.
- Click the Sub Execution IDs column again to sort the column and see all the IDs together.