Problem
When trying to install Maven libraries, you receive an error message.
ClusterLibrariesModel.scala:3569 : [30 occurrences] Error code: ERROR_MAVEN_LIBRARY_RESOLUTION. Library hash: XXXXXX, library type: maven, cluster info: {creator: Webapp, access mode: , Spark version: {DBR version}}. Library installation attempted on the driver node of cluster [ClusterID] org [WorkspaceID] and failed due to library resolution error.. Error message: Library resolution failed because unresolved dependency: net.minidev:json-smart:[1.3.1,2.3]: not found
Cause
A recent release of the Maven package json-smart-v2 has led to a significant disruption in dependency resolution. The central metadata for this package has been corrupted, resulting in the removal of all versions prior to 2.5.2. This results in an increase in Maven library resolution failures if some previous version of json-smart is required as the transitive dependency.
For details, refer to the original Github issue 2.5.2 Release Breaking Upstream Dependencies #240.
This issue also affects Google's Maven mirror, which is used by Databricks Runtime versions to resolve Maven libraries. Maven Central is used as the backup for Google's Maven mirror.
Solution
Databricks recommends installing your library separately from the smart-json library. You can use the UI or an API call.
Installing libraries using the UI
Install your library excluding the json-smart library. Indicate the exclusion in the Exclusions text box in the Install library modal.
In the same Install library modal, indicate the version of net.minidev:json-smart
you require, for example 2.3
.
Installing libraries using an API call
To add a library to your cluster or job creation, add the following “libraries”
payload to the /api/2.0/libraries/install
API call.
"libraries": [
{
"maven": {
"coordinates": "net.minidev:json-smart:<your-required-version-before-2.5.2>"
}
},
{
"maven": {
"coordinates": "<your-library-to-install>",
"exclusions": [
"net.minidev:json-smart:RELEASE"
]
}
}
]
Host a private Maven mirror
If you host a private Maven mirror, you can set the following Apache Spark configuration in your cluster settings.
spark.databricks.driver.preferredMavenCentralMirrorUrl <mirror-repo>
Implement a global init script
Implement a global init script that downloads all the needed JARs using ‘mvn dependencies:copy-dependencies`
from Maven, or from your storage location and moves them to /databricks/jars
.
apt update && apt install -y maven
cat > pom.xml << 'EOF'
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>temp</groupId>
<artifactId>temp</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs-spark_2.12</artifactId>
<version><a-version-before-2.5.2></version>
<exclusions>
<exclusion>
<groupId>net.minidev</groupId>
<artifactId>json-smart</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
EOF
mkdir -p /tmp/jars
mvn -f pom.xml dependency:copy-dependencies -DoutputDirectory=/tmp/jars
cp /tmp/jars/* /databricks/jars
If none of the above options are available, as a last resort you can use another Maven mirror for the repository. Note that Databricks cannot guarantee the quality or safety of the mirror. To select a mirror, refer to the mirror repository.