Problem
You receive INFO
statements despite configuring the Apache Spark settings to suppress INFO
and give specific WARN
statements. This issue is observed even after setting the 'py4j'
logger to WARN
and configuring the logging in the Spark config in the Databricks UI to WARN
.
The problem persists, leading to an overflow of INFO
logs, which can be problematic when integrating with monitoring tools like DataDog.
Cause
Configuring Spark settings to suppress INFO
logs does not override the default log4j2
settings in the Databricks cluster, which control logging behavior at a more granular level. These default log4j2
settings may still allow INFO
log generation.
Additionally, the integration with DataDog may not respect the Spark configuration settings, leading to the continued generation of INFO
logs.
Solution
Modify the log4j2
configuration file directly within the Databricks environment.
1. Use an init script that updates the log4j2.xml
file to suppress INFO
logs.
#!/bin/bash
set -e # Exit script on any error
# Define the log4j2 configuration file path (modify if needed)
LOG4J2_PATH="/databricks/spark/dbconf/log4j/driver/log4j2.xml"
# Modify the log4j2 configuration file
echo "Updating log4j2.xml to suppress INFO logs"
sed -i 's/level="INFO"/level="WARN"/g' $LOG4J2_PATH
echo "Completed log4j2 config changes at `date`"
2. Upload the init script to the Workspace Files. (You can create a .sh
file in the workspace files folder, add the contents of the script to the .sh
file and use the init script on the cluster)
3. Configure the cluster to use the init script by setting it in the Init Scripts tab.
"destination": "Workspace"
"/Users/<your-workspace-folder>/log4j_warn.sh"
4. Restart the cluster to apply the changes.