Job fails with atypical errors
message
Problem
Your job run fails with a throttled due to observing atypical errors
error message.
Cluster became unreachable during run Cause: xxx-xxxxxx-xxxxxxx is throttled due to observing atypical errors
Cause
The jobs on this cluster have returned too many large results to the Apache Spark driver node.
As a result, the chauffeur service runs out of memory, and the cluster becomes unreachable.
This can happen after calling the .collect
or .show
API.
Solution
You can either reduce the workload on the cluster or increase the value of spark.memory.chauffeur.size
.
The chauffeur service runs on the same host as the Spark driver. When you allocate more memory to the chauffeur service, less overall memory will be available for the Spark driver.
Set the value of spark.memory.chauffeur.size
:
- Open the cluster configuration page in your workspace.
- Click Edit.
- Expand Advanced Options.
- Enter the value of
spark.memory.chauffeur.size
in mb in the Spark Config field. - Click Confirm and Restart.
Note
The default value for spark.memory.chauffeur.size
is 1024 megabytes. This is written as spark.memory.chauffeur.size 1024mb
in the Spark configuration. The maximum value is the lesser of 16 GB or 20% of the driver node’s total memory.