Job fails with atypical errors message

Job run is throttled and fails due to observing atypical errors message.

Written by Adam Pavlacka

Last published at: May 11th, 2022

Problem

Your job run fails with a throttled due to observing atypical errors error message.

Cluster became unreachable during run Cause: xxx-xxxxxx-xxxxxxx is throttled due to observing atypical errors

Cause

The jobs on this cluster have returned too many large results to the Apache Spark driver node.

As a result, the chauffeur service runs out of memory, and the cluster becomes unreachable.

This can happen after calling the .collect or .show API.

Solution

You can either reduce the workload on the cluster or increase the value of spark.memory.chauffeur.size.

The chauffeur service runs on the same host as the Spark driver. When you allocate more memory to the chauffeur service, less overall memory will be available for the Spark driver.

Set the value of spark.memory.chauffeur.size:

Open the cluster configuration page in your workspace.
Click Edit.
Expand Advanced Options.
Enter the value of spark.memory.chauffeur.size in mb in the Spark config field.
Click Confirm and Restart.

Info

The default value for spark.memory.chauffeur.size is 1024 megabytes. This is written as spark.memory.chauffeur.size 1024mb in the Spark configuration. The maximum value is the lesser of 16 GB or 20% of the driver node’s total memory.