Problem
An Apache Spark job fails with a maxResultSize exception.
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of XXXX tasks (X.0 GB) is bigger than spark.driver.maxResultSize (X.0 GB)
Cause
This error occurs because the configured size limit was exceeded. The size limit applies to the total serialized results for Spark actions across all partitions. The Spark actions include actions such as collect() to the driver node, toPandas(), or saving a large file to the driver local file system.
Solution
First, refactor your code to prevent the driver node from collecting a large amount of data. Change the code so that the driver node collects a limited amount of data or increase the driver instance memory size.
If the workload requires you to collect a large amount of data and it cannot be updated, you can update the property spark.driver.maxResultSize to a value <X> GB higher than the value reported in the exception message in the cluster Spark config (AWS | Azure | GCP).
-
If you are using BI tools (such as PowerBI or Tableau, or your own custom application using a JDBC or ODBC driver) and are facing this issue, enable Cloud Fetch in your application.
Cloud Fetch allows the executors to upload the result sets to the workspace root cloud storage (Databricks File System), and then the driver node generates pre-signed URLs for the result set and sends it to the JDBC or ODBC driver.
This workflow allows results to be directly downloaded from cloud storage, avoiding the driver having to use up its memory to collect the results.
For more information Cloud Fetch, refer to the Driver capability settings for the Databricks ODBC Driver (AWS | Azure | GCP) documentation.