How to overwrite log4j configurations on Databricks clusters

Learn how to overwrite log4j configurations on Databricks clusters.

Written by Adam Pavlacka

Last published at: February 29th, 2024
Delete

Warning

This article describes steps related to customer use of Log4j 1.x within a Databricks cluster. Log4j 1.x is no longer maintained and has three known CVEs (CVE-2021-4104, CVE-2020-9488, and CVE-2019-17571). If your code uses one of the affected classes (JMSAppender or SocketServer), your use may potentially be impacted by these vulnerabilities. You should not enable either of these classes in your cluster.

There is no standard way to overwrite log4j configurations on clusters with custom configurations. You must overwrite the configuration files using init scripts.

The current configurations are stored in two log4j.properties files:

  • On the driver:
    %sh
    
    cat /home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties
  • On the worker:
    %sh
    
    cat /home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties

To set class-specific logging on the driver or on workers, use the following script:

%sh

#!/bin/bash
echo "Executing on Driver: $DB_IS_DRIVER"
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/driver/log4j.properties"
else
LOG4J_PATH="/home/ubuntu/databricks/spark/dbconf/log4j/executor/log4j.properties"
fi
echo "Adjusting log4j.properties here: ${LOG4J_PATH}"
echo "log4j.<custom-prop>=<value>" >> ${LOG4J_PATH}

Replace <custom-prop> with the property name, and <value> with the property value.

Upload the script to DBFS and select a cluster using the cluster configuration UI.

You can also set log4j.properties for the driver in the same way.

See Cluster node initialization scripts (AWS | Azure | GCP) for more information.