CosmosDB-Spark Connector Library Conflict

This article explains how to resolve an issue running applications that use the CosmosDB-Spark connector in the Databricks environment.

Affected versions

Databricks Runtime 4.0 and above (runtimes that include Spark 2.3).

Problem

Normally if you add a Maven dependency to your Spark cluster, your app should be able to use the required connector libraries. But currently, if you simply specify the CosmosDB-Spark connector’s Maven co-ordinates as a dependency for the cluster, you will get the following exception:

java.lang.NoClassDefFoundError: Could not initialize class com.microsoft.azure.cosmosdb.Document

Cause

This occurs because Spark 2.3 uses jackson-databind-2.6.7.1, whereas the CosmosDB-Spark connector uses jackson-databind-2.9.5. This creates a library conflict, and at the executor level you observe the following exception:

java.lang.NoSuchFieldError: ALLOW_TRAILING_COMMA
at com.microsoft.azure.cosmosdb.internal.Utils.<clinit>(Utils.java:69)

Solution

To avoid this problem:

  1. Directly download the CosmosDB-Spark connector Uber JAR: azure-cosmosdb-spark_2.3.0_2.11-1.2.2-uber.jar.
  2. Upload the downloaded JAR to Databricks following the instructions in Install the uploaded libraries.
  3. Install the uploaded libraries to your Databricks cluster.

For more information, see Azure Cosmos DB.