Python commands fail on Machine Learning clusters

Problem

You are using a Databricks Runtime for Machine Learning cluster and Python notebooks are failing.

You find an invalid syntax error in the logs.

SyntaxError: invalid syntax
  File "/local_disk0/tmp/1593092990800-0/PythonShell.py", line 363
    def __init__(self, *args, condaMagicHandler=None, **kwargs):

Cause

Key values in the /etc/environment/ file are being overwritten by user environment variables.

There are several default environment variables that should not be overwritten.

For example, MLFLOW_CONDA_HOME=/databricks/conda is set by default. If you overwrite this value it can result in the invalid syntax error.

This sample init script can cause the issue, because it is replacing, rather than appending a value.

dbutils.fs.put("/databricks/init-scripts/set-env.sh", """#!/bin/bash
sudo echo VAR1="VAL1" > /etc/environment
sudo echo VAR2="VAL2" > /etc/environment
sudo echo VAR3="VAL3" > /etc/environment
""", true)

Solution

You should not overwrite any values in the /etc/environment/ file.

You should always append variables to the /etc/environment/ file.

This sample init script avoids the issue by appending every to value to the /etc/environment/ file.

dbutils.fs.put("/databricks/init-scripts/set-env.sh", """#!/bin/bash
sudo echo VAR1="VAL1" >> /etc/environment
sudo echo VAR2="VAL2" >> /etc/environment
sudo echo VAR3="VAL3" >> /etc/environment
""", true)