Replay Apache Spark events in a cluster

Use a single node cluster to replay another cluster's event log in the Spark UI.

Written by arjun.kaimaparambilrajan

Last published at: February 10th, 2023

The Spark UI is commonly used as a debugging tool for Spark jobs.

If the Spark UI is inaccessible, you can load the event logs in another cluster and use the Event Log Replay notebook to replay the Spark events.

Delete

Warning

Cluster log delivery is not enabled by default. You must enable cluster log delivery before starting your cluster, otherwise there will be no logs to replay.

Follow the documentation to configure Cluster log delivery on your cluster.

The location of the cluster logs depends on the Cluster Log Path that you set during cluster configuration.

For example, if the log path is dbfs:/cluster-logs, the log files for a specific cluster will be stored in dbfs:/cluster-logs/<cluster-name> and the individual event logs will be stored in dbfs:/cluster-logs/<cluster-name>/eventlog/<cluster-name-cluster-ip>/<log-id>/.

Delete

Note

This example uses DBFS for cluster logs, but that is not a requirement. You can store cluster logs in DBFS or S3 storage.

Confirm cluster logs exist

Review the cluster log path and verify that logs are being written for your chosen cluster. Log files are written every five minutes.

Launch a single node cluster

Launch a single node cluster. You will replay the logs on this cluster.

Select the instance type based on the size of the event logs that you want to replay.

Run the Event Log Replay notebook

  • Attach the Event Log Replay notebook to the single node cluster.
  • Enter the path to your chosen cluster event logs in the event_log_path field in the notebook.
  • Run the notebook.

Event Log Replay notebook

Open notebook in a new tab.

Prevent items getting dropped from the UI

If you have a long-running cluster, it is possible for some jobs and/or stages to get dropped from the Spark UI.

This happens due to default UI limits that are intended to prevent the UI from using up too much memory and causing an out-of-memory error on the cluster.

If you are using a single node cluster to replay the event logs, you can increase the default UI limits and devote more memory to the Spark UI. This prevents items from getting dropped.

You can adjust these values during cluster creation by editing the Spark Config.

This example contains the default values for these properties.

spark.ui.retainedJobs 1000
spark.ui.retainedStages 1000
spark.ui.retainedTasks 100000
spark.sql.ui.retainedExecutions 1000