Apache Spark job fails with maxResultSize exception

Learn what to do when an Apache Spark job fails with a maxResultSize exception.

Written by Adam Pavlacka

Last published at: May 11th, 2022

Problem

A Spark job fails with a maxResultSize exception:

org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized

results of XXXX tasks (X.0 GB) is bigger than spark.driver.maxResultSize (X.0 GB)

Cause

This error occurs because the configured size limit was exceeded. The size limit applies to the total serialized results for Spark actions across all partitions. The Spark actions include actions such as collect() to the driver node, toPandas(), or saving a large file to the driver local file system.

Solution

In some situations, you might have to refactor the code to prevent the driver node from collecting a large amount of data. You can change the code so that the driver node collects a limited amount of data or increase the driver instance memory size. For example you can call toPandas with Arrow enabled or writing files and then read those files instead of collecting large amounts of data back to the driver.

If absolutely necessary you can set the property spark.driver.maxResultSize to a value <X>g higher than the value reported in the exception message in the cluster Spark config (AWS | Azure):

spark.driver.maxResultSize <X>g

The default value is 4g. For details, see Application Properties.

If you set a high limit, out-of-memory errors can occur in the driver (depending on spark.driver.memory and the memory overhead of objects in the JVM). Set an appropriate limit to prevent out-of-memory errors.