Jobs using Apache Spark 3.5.1 and the Elasticsearch Hadoop connector failing with MicroBatchExecution error

Contact the Elastic team for assistance, or use Databricks Runtime 13.3 LTS and below.

Written by Miguel Suarez

Last published at: February 7th, 2025

Problem

When trying to use Apache Spark 3.5.1 and the Elasticsearch Hadoop connector in your Databricks migration efforts, you notice a compatibility issue on your Elasticsearch instances and your job fails with the following error. 

 


ERROR streaming.MicroBatchExecution - Query reconquery [id = <id>, runId = <runId>] terminated with error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0 in stage 0.0 (TID 0) (executor 0): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
    at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.<init>(EsStreamQueryWriter.scala:50)
    at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink.$anonfun$addBatch$5(EsSparkSqlStreamingSink.scala:72)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
    at org.apache.spark.scheduler.Task.run(Task.scala:141)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

 

Cause

The Elasticsearch Hadoop connector is not compatible with Spark version 3.5.1. The issue is on the Elasticsearch side. For more information, refer to Elastic Project’s Github issue, ES-hadoop is not compatible with spark 3.5.1 #2210.

 

Solution

Databricks does not own or maintain the Elasticsearch Hadoop connector. To solve this issue: 

  1. Reach out to the Elastic team for assistance.
  2. Use Databricks Runtime 13.3 LTS instead, which has compatibility on Spark 3.4.1.