Problem
You are trying to use an init script stored on a Unity Catalog volume path, but your cluster fails to start. The error message in the cluster event logs is generic.
Script exit status is non-zero
The specific error message can be found in the init script logs (AWS | Azure | GCP).
bash: /Volumes/abc-init.sh: /bin/bash^M: bad interpreter: No such file or directory
If the init script is uploaded as a workspace file instead of stored on a volume, the cluster starts normally.
Cause
Init script execution fails when the new line code used in the init script is a carriage return + line feed (CRLF) instead of a line feed. Linux uses the line feed as the new line code, while Windows uses a carriage return + line feed as the new line code. When a text file is created on a Windows system, it defaults to using a carriage return + line feed.
When text files are uploaded as workspace files, the carriage return + line feed is converted to line feed. When text files are uploaded to Unity Catalog volumes, this conversion does not happen. As a result, the init script cannot be processed correctly by the cluster when it starts up.
Solution
Any init script uploaded to a volume must use a line feed as a new line.
If you created your init scripts on a Windows system you can:
- Convert all carriage return + line feed new lines to line feed new lines in your file before uploading.
- Upload your init script as a workspace file. This automatically converts the init script. You can then copy the init script from workspace files to a volume and it will work as expected.
- You can create your init script directly in a Databricks notebook.
Example
This sample code creates an init script called test-init.sh in /Volumes/test. It echoes the word TEST when run.
%python
dbutils.fs.put("/Volumes/test/test-init.sh",
"""#!/bin/bash
echo "TEST"
""", overwrite=True)
Info
Verify that the first line of the init script is #!/bin/bash
. This ensures the init script is executed with the correct shell interpreter.