Enable retries in init script

Add a retry function to your init script.

Written by arjun.kaimaparambilrajan

Last published at: March 4th, 2022

Init scripts are commonly used to configure Databricks clusters.

There are some scenarios where you may want to implement retries in an init script.

Example init script

This sample init script shows you how to implement a retry for a basic copy operation.

You can use this sample code as a base for implementing retries in your own init script.

%scala

dbutils.fs.put("dbfs:/databricks/<path-to-init-script>/retry-example-init.sh", """#!/bin/bash

echo "starting script at `date`"

function fail {
  echo $1 >&2
  exit 1
}

function retry {
  local n=1
  local max=5
  local delay=5
  while true; do
    "$@" && break || {
      if [[ $n -lt $max ]]; then
        ((n++))
        echo "Command failed. Attempt $n/$max: `date`"
        sleep $delay;
      else
        echo "Collecting additional info for debugging.."
        ps aux > /tmp/ps_info.txt 
        debug_log_file=debug_logs_${HOSTNAME}_$(date +"%Y-%m-%d--%H-%M").zip
        zip -r /tmp/${debug_log_file} /var/log/ /tmp/ps_info.txt /databricks/data/logs/
        cp /tmp/${debug_log_file} /dbfs/tmp/
        fail "The command has failed after $n attempts. `date`"
      fi
    }
  done
}

sleep 15s
echo "starting Copying at `date`"
retry cp -rv /dbfs/libraries/xyz.jar /databricks/jars/

echo "Finished script at `date`"
""", true)