Enable retries in init script

Init scripts are commonly used to configure Databricks clusters.

There are some scenarios where you may want to implement retries in an init script.

Example init script

This sample init script shows you how to implement a retry for a basic copy operation.

You can use this sample code as a base for implementing retries in your own init script.

dbutils.fs.put("dbfs:/databricks/<path-to-init-script>/retry-example-init.sh", """#!/bin/bash

echo "starting script at `date`"

function fail {
  echo $1 >&2
  exit 1
}

function retry {
  local n=1
  local max=5
  local delay=5
  while true; do
    "$@" && break || {
      if [[ $n -lt $max ]]; then
        ((n++))
        echo "Command failed. Attempt $n/$max: `date`"
        sleep $delay;
      else
        echo "Collecting additional info for debugging.."
        ps aux > /tmp/ps_info.txt
        debug_log_file=debug_logs_${HOSTNAME}_$(date +"%Y-%m-%d--%H-%M").zip
        zip -r /tmp/${debug_log_file} /var/log/ /tmp/ps_info.txt /databricks/data/logs/
        cp /tmp/${debug_log_file} /dbfs/tmp/
        fail "The command has failed after $n attempts. `date`"
      fi
    }
  done
}

sleep 15s
echo "starting Copying at `date`"
retry cp -rv /dbfs/libraries/xyz.jar /databricks/jars/

echo "Finished script at `date`"
""", true)