Init script stored on a volume fails to execute on cluster start

Init scripts created on Windows systems and uploaded to Unity Catalog volumes have CRLF as a newline which needs to be converted to LF before the cluster can process it.

Written by kunal.jadhav

Last published at: October 24th, 2024

Problem

You are trying to use an init script stored on a Unity Catalog volume path, but your cluster fails to start. The error message in the cluster event logs is generic.

Script exit status is non-zero

The specific error message can be found in the init script logs (AWSAzureGCP).

bash: /Volumes/abc-init.sh: /bin/bash^M: bad interpreter: No such file or directory

Image showing an error message: bash: /Volumes/abc-init.sh: /bin/bash^M: bad interpreter: No such file or directory

If the init script is uploaded as a workspace file instead of stored on a volume, the cluster starts normally.

Cause

Init script execution fails when the new line code used in the init script is a carriage return + line feed (CRLF) instead of a line feed. Linux uses the line feed as the new line code, while Windows uses a carriage return + line feed as the new line code. When a text file is created on a Windows system, it defaults to using a carriage return + line feed.

When text files are uploaded as workspace files, the carriage return + line feed is converted to line feed. When text files are uploaded to Unity Catalog volumes, this conversion does not happen. As a result, the init script cannot be processed correctly by the cluster when it starts up.

Solution

Any init script uploaded to a volume must use a line feed as a new line.

If you created your init scripts on a Windows system you can:

  • Convert all carriage return + line feed new lines to line feed new lines in your file before uploading.
  • Upload your init script as a workspace file. This automatically converts the init script. You can then copy the init script from workspace files to a volume and it will work as expected.
  • You can create your init script directly in a Databricks notebook.

Example

This sample code creates an init script called test-init.sh in /Volumes/test. It echoes the word TEST when run.

%python

dbutils.fs.put("/Volumes/test/test-init.sh", 
"""#!/bin/bash
echo "TEST"
""", overwrite=True)

Info

Verify that the first line of the init script is #!/bin/bash. This ensures the init script is executed with the correct shell interpreter.