Problem
When updating Databricks Runtime from previous versions (9.x - 11.x) to any of 11.3 LTS - 15.3 (current), the Maven Libraries start failing with connection timed-out issues while connecting to the repository.
Example
Server access error at url https://repo1.maven.org/maven2/com/microsoft/azure/azure-eventhubs-spark_2.12/2.3.22/azure-eventhubs-spark_2.12-2.3.22.pom (java.net.ConnectException: Connection timed out (Connection timed out))
Cause
As of Databricks Runtime 11.x, Maven libraries now resolve in your compute plane by default when you install libraries on a cluster. Your cluster must have access to Maven Central.
To review the change notes in the documentation, please see Databricks Runtime 11.0 release notes (AWS | Azure | GCP).
Solution
Whitelist Maven Central and the new Maven repo for your cluster to work with this feature.
If needed, you can revert your cluster to the previous behavior using the configuration spark.databricks.libraries.enableMavenResolution false
For more information, please review the Apache Spark settings in the Compute configuration reference (AWS | Azure | GCP) documentation.
Additionally, you may also wish to whitelist the following Maven repos:
- repos.spark-packages.org
- repo1.maven.org
- repo.maven.apache.org
- maven-central.storage-download.googleapis.com
If the issue persists, discard any proxy script that can disrupt Databricks Runtime’s connections to the Maven repositories.