Problem
When trying to use Apache Spark 3.5.1 and the Elasticsearch Hadoop connector in your Databricks migration efforts, you notice a compatibility issue on your Elasticsearch instances and your job fails with the following error.
ERROR streaming.MicroBatchExecution - Query reconquery [id = <id>, runId = <runId>] terminated with error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0 in stage 0.0 (TID 0) (executor 0): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/catalyst/encoders/ExpressionEncoder;
at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.<init>(EsStreamQueryWriter.scala:50)
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink.$anonfun$addBatch$5(EsSparkSqlStreamingSink.scala:72)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Cause
The Elasticsearch Hadoop connector is not compatible with Spark version 3.5.1. The issue is on the Elasticsearch side. For more information, refer to Elastic Project’s Github issue, ES-hadoop is not compatible with spark 3.5.1 #2210.
Solution
Databricks does not own or maintain the Elasticsearch Hadoop connector. To solve this issue:
- Reach out to the Elastic team for assistance.
- Use Databricks Runtime 13.3 LTS instead, which has compatibility on Spark 3.4.1.