Databricks includes a number of default Java and Scala libraries. You can replace any of these libraries with another version by using a cluster-scoped init script to remove the default library jar and then install the version you require.
Identify the artifact id
To identify the name of the jar file you want to remove:
- Click the Databricks Runtime version you are using from the list of supported releases (AWS | Azure | GCP).
- Navigate to the Java and Scala libraries section.
- Identify the Artifact ID for the library you want to remove.
Use the artifact id to find the jar filename
Use the ls -l command in a notebook to find the jar that contains the artifact id. For example, to find the jar filename for the spark-snowflake_2.12 artifact id in Databricks Runtime 7.0 you can use the following code:
%sh ls -l /databricks/jars/*spark-snowflake_2.12*
This returns the jar filename
Upload the replacement jar file
Upload your replacement jar file to a DBFS path.
Create the init script
Use the following template to create a cluster-scoped init script.
%sh #!/bin/bash rm -rf /databricks/jars/<jar_filename_to_remove>.jar cp /dbfs/<path_to_replacement_jar>/<replacement_jar_filename>.jar /databricks/jars/
Using the spark-snowflake_2.12 example from the prior step would result in an init script similar to the following:
%sh #!/bin/bash rm -rf /databricks/jars/----workspace_spark_3_0--maven-trees--hive-2.3__hadoop-2.7--net.snowflake--spark-snowflake_2.12--net.snowflake__spark-snowflake_2.12__2.5.9-spark_2.4.jar cp /dbfs/FileStore/jars/e43fe9db_c48d_412b_b142_cdde10250800-spark_snowflake_2_11_2_7_1_spark_2_4-b2adc.jar /databricks/jars/