Python REPL fails to start in Docker

Problem

When you use a Docker container that includes prebuilt Python libraries, Python commands fail and the virtual environment is not created. The following error message is visible in the driver logs.

20/02/29 16:38:35 WARN PythonDriverWrapper: Failed to start repl ReplId-5b591-0ce42-78ef3-7
java.io.IOException: Cannot run program "/local_disk0/pythonVirtualEnvDirs/virtualEnv-56a5be60-3e71-486f-ac04-08e8f2491032/bin/python" (in directory "."): error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at org.apache.spark.util.Utils$.executeCommand(Utils.scala:1367)
        at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1393)
        at org.apache.spark.util.Utils$.executePythonAndGetOutput(Utils.scala:

        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 17 more

You can confirm the issue by running the following command in a notebook:

%sh virtualenv --no-site-packages

The result is an error message similar to the following:

usage: virtualenv [--version] [--with-traceback] [-v | -q] [--discovery {builtin}] [-p py] [--creator {builtin,cpython3-posix,venv}] [--seeder {app-data,pip}] [--no-seed] [--activators comma_separated_list] [--clear]
                  [--system-site-packages] [--symlinks | --copies] [--download | --no-download] [--extra-search-dir d [d ...]] [--pip version] [--setuptools version] [--wheel version] [--no-pip] [--no-setuptools] [--no-wheel]
                  [--clear-app-data] [--symlink-app-data] [--prompt prompt] [-h]
                  dest
virtualenv: error: the following arguments are required: dest

The virtualenv command does not recognize the --no-site-packages option.

Version

The problem affects all current Databricks Runtime versions, except for Databricks Runtime versions that include Conda. It affects virtualenv library version 20.0.0 and above.

Cause

This issue is caused by using a Python virtualenv library version in the Docker container that does not support the --no-site-packages option.

Databricks Runtime requires a virtualenv library that supports the --no-site-packages option. This option was removed in virtualenv library version 20.0.0 and above.

You can verify your virtualenv library version by running the following command in a notebook:

%sh virtualenv --version

Solution

You can resolve the issue by specifying a compatible version when you install the virtualenv library.

For example, setting virtualenv==16.0.0 in the Dockerfile installs virtualenv library version 16.0.0. This version of the library supports the required option.