Job fails with NoSuchElementException error

NoSuchElementException errors can occur when using Apache Arrow.

Written by ashish

Last published at: March 3rd, 2023


You are getting intermittent job failures with a NoSuchElementException error. 

Example stack trace

Py4JJavaError: An error occurred while calling o2843.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in stage 868.0 failed 4 times, most recent failure: Lost task 17.3 in stage 868.0 (TID 3065) ( executor 6): java.util.NoSuchElementException
at org.apache.spark.sql.vectorized.ColumnarBatch$
at org.apache.spark.sql.vectorized.ColumnarBatch$
at scala.collection.convert.Wrappers$
at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$
at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown Source)


The NoSuchElementException error is the result of an issue in Apache Arrow optimization. Apache Arrow is an in-memory columnar data format that is used in spark to efficiently transfer data between the JVM and Python.

When Arrow optimization is enabled in the Py4J interface, there is a possibility it can call without checking iterator.hasNext(). This can result in a NoSuchElementException error. 


Set spark.databricks.pyspark.emptyArrowBatchCheck to true in the cluster's Spark config (AWS | Azure | GCP).


Enabling spark.databricks.pyspark.emptyArrowBatchCheck prevents a NoSuchElementException error from occurring when the Arrow batch size is 0.

Alternatively, you can disable Arrow optimization by setting the following properties in your cluster's Spark config.


Disabling Arrow optimization may have performance implications.

Was this article helpful?