Apache Spark job fails with Failed to parse byte string

Apache Spark job fails with a Failed to parse byte string error.

Written by noopur.nigam

Last published at: May 10th, 2022

Problem

Spark-submit jobs fail with a Failed to parse byte string: -1 error message.

java.util.concurrent.ExecutionException: java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
Failed to parse byte string: -1
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:182)
... 108 more
Caused by: java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
Failed to parse byte string: -1

Cause

The value of the spark.driver.maxResultSize application property is negative.

Solution

The value assigned to spark.driver.maxResultSize defines the maximum size (in bytes) of the serialized results for each Spark action. You can assign a positive value to the spark.driver.maxResultSize property to define a specific size. You can also assign a value of 0 to define an unlimited maximum size. You cannot assign a negative value to this property.

If the total size of a job is above the spark.driver.maxResultSize value, the job is aborted.

You should be careful when setting an excessively high (or unlimited) value for spark.driver.maxResultSize. A high limit can cause out-of-memory errors in the driver if the spark.driver.memory property is not set high enough.

See Spark Configuration Application Properties for more details.