Python commands fail on Machine Learning clusters

Python commands are failing on Databricks Runtime for Machine Learning clusters. Conda.

Written by arjun.kaimaparambilrajan

Last published at: May 16th, 2022

Problem

You are using a Databricks Runtime for Machine Learning cluster and Python notebooks are failing.

You find an invalid syntax error in the logs.

SyntaxError: invalid syntax
  File "/local_disk0/tmp/1593092990800-0/PythonShell.py", line 363
    def __init__(self, *args, condaMagicHandler=None, **kwargs):

Cause

Key values in the /etc/environment/ file are being overwritten by user environment variables.

There are several default environment variables that should not be overwritten.

For example, MLFLOW_CONDA_HOME=/databricks/conda is set by default. If you overwrite this value it can result in the invalid syntax error.

This sample init script can cause the issue, because it is replacing, rather than appending a value.

%python

dbutils.fs.put("/databricks/init-scripts/set-env.sh", """#!/bin/bash
sudo echo VAR1="VAL1" > /etc/environment
sudo echo VAR2="VAL2" > /etc/environment
sudo echo VAR3="VAL3" > /etc/environment
""", true)

Solution

You should not overwrite any values in the /etc/environment/ file.

You should always append variables to the /etc/environment/ file.

This sample init script avoids the issue by appending every to value to the /etc/environment/ file.

%python

dbutils.fs.put("/databricks/init-scripts/set-env.sh", """#!/bin/bash
sudo echo VAR1="VAL1" >> /etc/environment
sudo echo VAR2="VAL2" >> /etc/environment
sudo echo VAR3="VAL3" >> /etc/environment
""", true)