CosmosDB-Spark connector library conflict

Learn how to resolve conflicts that arise when using the CosmosDB-Spark connector library with Databricks.

Written by Adam Pavlacka

Last published at: June 1st, 2022

This article explains how to resolve an issue running applications that use the CosmosDB-Spark connector in the Databricks environment.

Problem

Normally if you add a Maven dependency to your Spark cluster, your app should be able to use the required connector libraries. But currently, if you simply specify the CosmosDB-Spark connector’s Maven co-ordinates as a dependency for the cluster, you will get the following exception:

java.lang.NoClassDefFoundError: Could not initialize class com.microsoft.azure.cosmosdb.Document

Cause

This occurs because Spark 2.3 uses jackson-databind-2.6.7.1, whereas the CosmosDB-Spark connector uses jackson-databind-2.9.5. This creates a library conflict, and at the executor level you observe the following exception:

java.lang.NoSuchFieldError: ALLOW_TRAILING_COMMA
at com.microsoft.azure.cosmosdb.internal.Utils.<clinit>(Utils.java:69)

Solution

To avoid this problem:

  1. Directly download the CosmosDB-Spark connector Uber JAR: azure-cosmosdb-spark_2.3.0_2.11-1.2.2-uber.jar.
  2. Upload the downloaded JAR to Databricks following the instructions in Upload a Jar, Python egg, or Python wheel (AWS | Azure).
  3. Install the uploaded library as a Cluster-installed library (AWS | Azure)

For more information, see Azure Cosmos DB (AWS | Azure).