This article explains how to resolve an issue running applications that use the CosmosDB-Spark connector in the Databricks environment.
Problem
Normally if you add a Maven dependency to your Spark cluster, your app should be able to use the required connector libraries. But currently, if you simply specify the CosmosDB-Spark connector’s Maven co-ordinates as a dependency for the cluster, you will get the following exception:
java.lang.NoClassDefFoundError: Could not initialize class com.microsoft.azure.cosmosdb.Document
Cause
This occurs because Spark 2.3 uses jackson-databind-2.6.7.1, whereas the CosmosDB-Spark connector uses jackson-databind-2.9.5. This creates a library conflict, and at the executor level you observe the following exception:
java.lang.NoSuchFieldError: ALLOW_TRAILING_COMMA at com.microsoft.azure.cosmosdb.internal.Utils.<clinit>(Utils.java:69)
Solution
To avoid this problem:
- Directly download the CosmosDB-Spark connector Uber JAR: azure-cosmosdb-spark_2.3.0_2.11-1.2.2-uber.jar.
- Upload the downloaded JAR to Databricks following the instructions in Upload a Jar, Python egg, or Python wheel (AWS | Azure).
- Install the uploaded library as a Cluster-installed library (AWS | Azure)