Problem
You’re attempting to set up the Dask library using an init script from the Nvidia tutorial RAPIDS on Databricks: A Guide to GPU-Accelerated Data Processing and the cluster fails to start, displaying the error Script exit status is non-zero
.
Example init script
```bash
#!/bin/bash
set -e
# Install RAPIDS (cudf & dask-cudf) and dask-databricks
/databricks/python/bin/pip install --extra-index-url=https://pypi.nvidia.com \
cudf-cu11 \
dask[complete] \
dask-cudf-cu11 \
dask-cuda==24.04 \
Dask-databricks
# Start Dask cluster with CUDA workers
dask databricks run --cuda
```
When you run the command dask databricks run –-cuda
in a notebook, you receive a separate error.
[22:25:01] INFO Setting up Dask on a Databricks cluster. ]8;id=856276;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py\cli.py]8;;\:]8;id=XXXX;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py#37\37]8;;\
ERROR Unable to find expected environment variables ]8;id=942;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py\cli.py]8;;\:]8;id=XXXX;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py#43\43]8;;\
DB_IS_DRIVER and DB_DRIVER_IP. Are you running
this command on a Databricks multi-node cluster?
Cause
The init script failure causes the cluster not to start. The init script fails because the dask databricks run --cuda
command is executed before the necessary environment variables, DB_IS_DRIVER
and DB_DRIVER_IP
, are set.
Solution
Modify your init script to include a validation check for the required environment variables before executing the dask databricks run --cuda
command.
Example init script with validation checks
This script installs the required packages and then checks if DB_IS_DRIVER
and DB_DRIVER_IP
are set. If they are, it starts the Dask cluster with CUDA workers; otherwise, it skips the startup.
```bash
#!/bin/bash
set -e
# Install RAPIDS (cudf & dask-cudf) and dask-databricks
/databricks/python/bin/pip install --extra-index-url=https://pypi.nvidia.com \
cudf-cu11 \
dask[complete] \
dask-cudf-cu11 \
dask-cuda==24.04 \
dask-databricks
# Check if the necessary environment variables are set
if [[ -n "$DB_IS_DRIVER" && -n "$DB_DRIVER_IP" ]]; then
echo "Environment variables are set. Starting Dask cluster with CUDA workers."
dask databricks run --cuda
else
echo "Required environment variables DB_IS_DRIVER and DB_DRIVER_IP are not set. Skipping Dask cluster startup."
fi
```
Additionally, ensure that you are using Databricks Runtime 14.2 ML (which includes Apache Spark 3.5.0, GPU, and Scala 2.12) to avoid Python dependency problems. Use a single g4dn.xlarge node with GPU attached so the initialization script finishes successfully.