Databricks recently published a blog on Log4j 2 Vulnerability (CVE-2021-44228) Research and Assessment. Databricks does not directly use a version of Log4j known to be affected by this vulnerability within the Databricks platform in a way we understand may be vulnerable.
Databricks also does not use the affected classes from Log4j 1.x with known vulnerabilities (CVE-2021-4104, CVE-2020-9488, and CVE-2019-17571). However, if your code uses one of these classes (JMSAppender or SocketServer), your use may potentially be impacted by these vulnerabilities.
If your code uses Log4j, you should upgrade to Log4j 2.17 or above.
If you cannot upgrade for technical reasons, you can use a global init script (AWS | Azure | GCP) to strip the affected classes from Log4j on cluster start.
Configure the global init script
AWS
- Go to the Admin Console and click the Global Init Scripts tab.
- Click the + Add button.
- Enter the name of the script.
- Copy the following script into the Script field.
%sh #!/bin/bash echo 'Init script to remove certain Log4J 1.x classes, version 1.0 (2021-12-17)' FILES_TO_DELETE=( org/apache/log4j/net/JMSAppender.class org/apache/log4j/net/SocketServer.class ) find "/databricks" \ -name '*log4j*.jar' \ -exec echo -e "\nProcessing {}" \; -exec zip -d {} "${FILES_TO_DELETE[@]}" \; exit 0
- If you have more than one global init script configured for your workspace, you should configure this script to run after your other scripts.
- Ensure the Enabled switch is toggled on.
- Click Add.
- Restart ALL running clusters.
Azure
- Go to the Admin Console and click the Global Init Scripts tab.
- Click the + Add button.
- Enter the name of the script.
- Copy the following script into the Script field.
%sh #!/bin/bash echo 'Init script to remove certain Log4J 1.x classes, version 1.0 (2021-12-17)' FILES_TO_DELETE=( org/apache/log4j/net/JMSAppender.class org/apache/log4j/net/SocketServer.class ) find "/databricks" \ -name '*log4j*.jar' \ -exec echo -e "\nProcessing {}" \; -exec zip -d {} "${FILES_TO_DELETE[@]}" \; exit 0
- If you have more than one global init script configured for your workspace, you should configure this script to run after your other scripts.
- Ensure the Enabled switch is toggled on.
- Click Add.
- Restart ALL running clusters.
GCP
Use the Global Init Scripts API 2.0 to apply the following init script to every cluster in your workspace.
%sh #!/bin/bash echo 'Init script to remove certain Log4J 1.x classes, version 1.0 (2021-12-17)' FILES_TO_DELETE=( org/apache/log4j/net/JMSAppender.class org/apache/log4j/net/SocketServer.class ) find "/databricks" \ -name '*log4j*.jar' \ -exec echo -e "\nProcessing {}" \; -exec zip -d {} "${FILES_TO_DELETE[@]}" \; exit 0
Restart ALL running clusters after applying the global init script.
DeleteVerify the affected classes are not available
You should run a test on each cluster to ensure the affected classes are not available.
Test 1
You can run an assert check on the affected classes in a notebook.
%scala assert(this.getClass.getClassLoader().getResource("org/apache/log4j/net/JMSAppender.class") == null) assert(this.getClass.getClassLoader().getResource("org/apache/log4j/net/SocketServer.class") == null)
This sample code runs successfully if you have disabled the affected classes.
This sample code should return an error if you have NOT disabled the affected classes.
Test 2
You can attempt to import the affected classes into a notebook.
%scala import org.apache.log4j.net.JMSAppender import org.apache.log4j.net.SocketServer
This sample code runs successfully if you have NOT disabled the affected classes.
This sample code should return an error if you have disabled the affected classes.
Caveats
There are some corner cases where you can re-introduce the Log4j 1.x versions of JMSAppender or SocketServer.
Problem
If you install a Maven library with a transitive dependency on Log4j 1.x, all of its classes are re-added to the classpath.
Solution
You can work around this issue by adding Log4j to the Exclusions field when installing Maven libraries.
Problem
If you configure an external Apache Hive metastore, Apache Spark uses Ivy to resolve and download the correct metastore client library, and all of its transitive dependencies, possibly including Log4j 1.x.
To speed up cluster launch, you can cache the downloaded jars on DBFS and use an init script to install from the cache. If you cache jars like this, it is possible that Log4j 1.x may be included.
Solution
You can configure the init script for your external metastore to delete the affected classes.