Enable retries in init script
Init scripts are commonly used to configure Databricks clusters.
There are some scenarios where you may want to implement retries in an init script.
Example init script
This sample init script shows you how to implement a retry for a basic copy operation.
You can use this sample code as a base for implementing retries in your own init script.
dbutils.fs.put("dbfs:/databricks/<path-to-init-script>/retry-example-init.sh", """#!/bin/bash
echo "starting script at `date`"
function fail {
echo $1 >&2
exit 1
}
function retry {
local n=1
local max=5
local delay=5
while true; do
"$@" && break || {
if [[ $n -lt $max ]]; then
((n++))
echo "Command failed. Attempt $n/$max: `date`"
sleep $delay;
else
echo "Collecting additional info for debugging.."
ps aux > /tmp/ps_info.txt
debug_log_file=debug_logs_${HOSTNAME}_$(date +"%Y-%m-%d--%H-%M").zip
zip -r /tmp/${debug_log_file} /var/log/ /tmp/ps_info.txt /databricks/data/logs/
cp /tmp/${debug_log_file} /dbfs/tmp/
fail "The command has failed after $n attempts. `date`"
fi
}
done
}
sleep 15s
echo "starting Copying at `date`"
retry cp -rv /dbfs/libraries/xyz.jar /databricks/jars/
echo "Finished script at `date`"
""", true)