Maven Libraries Start Failing with Timed-Out Errors When Updating to Databricks Runtime 11.3 LTS - 15.3 (current)

Whitelist Maven Central and the new Maven repo.

Written by david.vega

Last published at: September 12th, 2024

Problem

When updating Databricks Runtime from previous versions (9.x - 11.x) to any of 11.3 LTS - 15.3 (current), the Maven Libraries start failing with connection timed-out issues while connecting to the repository. 

Example

Server access error at url https://repo1.maven.org/maven2/com/microsoft/azure/azure-eventhubs-spark_2.12/2.3.22/azure-eventhubs-spark_2.12-2.3.22.pom (java.net.ConnectException: Connection timed out (Connection timed out))

Cause

As of Databricks Runtime 11.x, Maven libraries now resolve in your compute plane by default when you install libraries on a cluster. Your cluster must have access to Maven Central

To review the change notes in the documentation, please see Databricks Runtime 11.0 release notes (AWSAzureGCP).

Solution

Whitelist Maven Central and the new Maven repo for your cluster to work with this feature. 

If needed, you can revert your cluster to the previous behavior using the configuration spark.databricks.libraries.enableMavenResolution false 

For more information, please review the Apache Spark settings in the Compute configuration reference (AWSAzureGCP) documentation.

 Additionally, you may also wish to whitelist the following Maven repos:

If the issue persists, discard any proxy script that can disrupt Databricks Runtime’s connections to the Maven repositories.