You have a scenario that requires Apache Hadoop properties to be set.
You would normally do this in the core-site.xml file.
In this article, we explain how you can set core-site.xml in a cluster.
Create the core-site.xml file in DBFS
You need to create a core-site.xml file and save it to DBFS on your cluster.
An easy way to create this file is via a bash script in a notebook.
This example code creates a hadoop-configs folder on your cluster and then writes a single property core-site.xml file to that folder.
%sh mkdir -p /dbfs/hadoop-configs/ cat << 'EOF' > /dbfs/hadoop-configs/core-site.xml <property> <name><property-name-here></name> <value><property-value-here></value> </property> EOF
You can add multiple properties to the file by adding additional name/value pairs to the script.
You can also create this file locally, and then upload it to your cluster.
Create an init script that loads core-site.xml
This example code creates an init script called set-core-site-configs.sh that uses the core-site.xml file you just created.
If you manually uploaded a core-site.xml file and stored it elsewhere, you should update the config_xml value in the example code.
%python dbutils.fs.put("/databricks/scripts/set-core-site-configs.sh", """ #!/bin/bash echo "Setting core-site.xml configs at `date`" START_DRIVER_SCRIPT=/databricks/spark/scripts/start_driver.sh START_WORKER_SCRIPT=/databricks/spark/scripts/start_spark_slave.sh TMP_DRIVER_SCRIPT=/tmp/start_driver_temp.sh TMP_WORKER_SCRIPT=/tmp/start_spark_slave_temp.sh TMP_SCRIPT=/tmp/set_core-site_configs.sh config_xml="/dbfs/hadoop-configs/core-site.xml" cat >"$TMP_SCRIPT" <<EOL #!/bin/bash ## Setting core-site.xml configs sed -i '/<\/configuration>/{ r $config_xml a \</configuration> d }' /databricks/spark/dbconf/hadoop/core-site.xml EOL cat "$TMP_SCRIPT" > "$TMP_DRIVER_SCRIPT" cat "$TMP_SCRIPT" > "$TMP_WORKER_SCRIPT" cat "$START_DRIVER_SCRIPT" >> "$TMP_DRIVER_SCRIPT" mv "$TMP_DRIVER_SCRIPT" "$START_DRIVER_SCRIPT" cat "$START_WORKER_SCRIPT" >> "$TMP_WORKER_SCRIPT" mv "$TMP_WORKER_SCRIPT" "$START_WORKER_SCRIPT" echo "Completed core-site.xml config changes `date`" """, True)
Attach the init script to your cluster
You need to configure the newly created init script as a cluster-scoped init script.
If you used the example code, your Destination is DBFS and the Init Script Path is dbfs:/databricks/scripts/set-core-site-configs.sh.
If you customized the example code, ensure that you enter the correct path and name of the init script when you attach it to the cluster.