The Spark UI is commonly used as a debugging tool for Spark jobs.
If the Spark UI is inaccessible, you can load the event logs in another cluster and use the Event Log Replay notebook to replay the Spark events.
Follow the documentation to configure Cluster log delivery on your cluster.
The location of the cluster logs depends on the Cluster Log Path that you set during cluster configuration.
For example, if the log path is dbfs:/cluster-logs, the log files for a specific cluster will be stored in dbfs:/cluster-logs/<cluster-name> and the individual event logs will be stored in dbfs:/cluster-logs/<cluster-name>/eventlog/<cluster-name-cluster-ip>/<log-id>/.
Confirm cluster logs exist
Review the cluster log path and verify that logs are being written for your chosen cluster. Log files are written every five minutes.
Launch a single node cluster
Launch a single node cluster. You will replay the logs on this cluster.
Select the instance type based on the size of the event logs that you want to replay.
Run the Event Log Replay notebook
- Attach the Event Log Replay notebook to the single node cluster.
- Enter the path to your chosen cluster event logs in the event_log_path field in the notebook.
- Run the notebook.
Event Log Replay notebook
Prevent items getting dropped from the UI
If you have a long-running cluster, it is possible for some jobs and/or stages to get dropped from the Spark UI.
This happens due to default UI limits that are intended to prevent the UI from using up too much memory and causing an out-of-memory error on the cluster.
If you are using a single node cluster to replay the event logs, you can increase the default UI limits and devote more memory to the Spark UI. This prevents items from getting dropped.
You can adjust these values during cluster creation by editing the Spark Config.
This example contains the default values for these properties.
spark.ui.retainedJobs 1000 spark.ui.retainedStages 1000 spark.ui.retainedTasks 100000 spark.sql.ui.retainedExecutions 1000