Init script to set up Dask library fails and cluster won’t start

Modify the initialization script to include a validation check for the required environment variables first.

Written by julian.campabadal

Last published at: December 20th, 2024

Problem

You’re attempting to set up the Dask library using an init script from the Nvidia tutorial RAPIDS on Databricks: A Guide to GPU-Accelerated Data Processing and the cluster fails to start, displaying the error Script exit status is non-zero

 

Example init script 

 

```bash
#!/bin/bash
set -e

# Install RAPIDS (cudf & dask-cudf) and dask-databricks
/databricks/python/bin/pip install       --extra-index-url=https://pypi.nvidia.com \
      cudf-cu11 \
      dask[complete] \
      dask-cudf-cu11  \
      dask-cuda==24.04 \
      Dask-databricks

# Start Dask cluster with CUDA workers
dask databricks run --cuda
```

 

When you run the command dask databricks run –-cuda in a notebook, you receive a separate error. 

 

[22:25:01] INFO     Setting up Dask on a Databricks cluster.           ]8;id=856276;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py\cli.py]8;;\:]8;id=XXXX;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py#37\37]8;;\
           ERROR    Unable to find expected environment variables      ]8;id=942;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py\cli.py]8;;\:]8;id=XXXX;file:///databricks/python/lib/python3.10/site-packages/dask_databricks/cli.py#43\43]8;;\
                    DB_IS_DRIVER and DB_DRIVER_IP. Are you running              
                    this command on a Databricks multi-node cluster?

 

Cause

The init script failure causes the cluster not to start. The init script fails because the dask databricks run --cuda command is executed before the necessary environment variables, DB_IS_DRIVER and DB_DRIVER_IP, are set. 

 

Solution

Modify your init script to include a validation check for the required environment variables before executing the dask databricks run --cuda command. 

 

Example init script with validation checks

This script installs the required packages and then checks if DB_IS_DRIVER and DB_DRIVER_IP are set. If they are, it starts the Dask cluster with CUDA workers; otherwise, it skips the startup.

 

```bash
#!/bin/bash
set -e

# Install RAPIDS (cudf & dask-cudf) and dask-databricks
/databricks/python/bin/pip install --extra-index-url=https://pypi.nvidia.com \
      cudf-cu11 \
      dask[complete] \
      dask-cudf-cu11  \
      dask-cuda==24.04 \
      dask-databricks

# Check if the necessary environment variables are set
if [[ -n "$DB_IS_DRIVER" && -n "$DB_DRIVER_IP" ]]; then
  echo "Environment variables are set. Starting Dask cluster with CUDA workers."
  dask databricks run --cuda
else
  echo "Required environment variables DB_IS_DRIVER and DB_DRIVER_IP are not set. Skipping Dask cluster startup."
fi
```

 

Additionally, ensure that you are using Databricks Runtime 14.2 ML (which includes Apache Spark 3.5.0, GPU, and Scala 2.12) to avoid Python dependency problems. Use a single g4dn.xlarge node with GPU attached so the initialization script finishes successfully.