Problem
Your job run fails with a throttled due to observing atypical errors error message.
Cluster became unreachable during run Cause: xxx-xxxxxx-xxxxxxx is throttled due to observing atypical errors
Cause
The jobs on this cluster have returned too many large results to the Apache Spark driver node.
As a result, the chauffeur service runs out of memory, and the cluster becomes unreachable.
This can happen after calling the .collect or .show API.
Solution
You can either reduce the workload on the cluster or increase the value of spark.memory.chauffeur.size.
The chauffeur service runs on the same host as the Spark driver. When you allocate more memory to the chauffeur service, less overall memory will be available for the Spark driver.
Set the value of spark.memory.chauffeur.size:
- Open the cluster configuration page in your workspace.
- Click Edit.
- Expand Advanced Options.
- Enter the value of spark.memory.chauffeur.size in mb in the Spark config field.
- Click Confirm and Restart.