Error compressed buffer size exceeds 2 GB when saving data

Set the Apache Spark configs to increase the frequency of row group size check.

Last published at: January 29th, 2025

Problem

When you try to save your data, your Apache Spark job fails with the below error.

Caused by: java.io.IOException: Compressed buffer size exceeds 2147483647. The size of individual input values might be too large. Lower page/block row size checks to write data more often 
at org.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:83)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)

Cause

You have individual records which exceed the 2 GB buffer size limit. The Parquet writer groups records together and checks the block size to determine when to close the row group. However, when a single record is too large, it can cause the buffer size to overflow.

Solution

Navigate to your cluster.
Click Advanced options.
In the Spark config box under the Spark tab, add the following configuration settings to adjust the Parquet page and block sizes.

Spark.hadoop.parquet.page.size.row.check.max 1 spark.hadoop.parquet.block.size.row.check.max 1 spark.hadoop.parquet.page.size.row.check.min 1 spark.hadoop.parquet.block.size.row.check.min 1

These configurations increase the frequency of row group size checks in Parquet files. The default value for these configs is 10.

Databricks Help Center

Problem

Cause

Solution

Contact Us