Problem
When you use a Docker container that includes prebuilt Python libraries, Python commands fail and the virtual environment is not created. The following error message is visible in the driver logs.
20/02/29 16:38:35 WARN PythonDriverWrapper: Failed to start repl ReplId-5b591-0ce42-78ef3-7 java.io.IOException: Cannot run program "/local_disk0/pythonVirtualEnvDirs/virtualEnv-56a5be60-3e71-486f-ac04-08e8f2491032/bin/python" (in directory "."): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.util.Utils$.executeCommand(Utils.scala:1367) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1393) at org.apache.spark.util.Utils$.executePythonAndGetOutput(Utils.scala: … at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 17 more
You can confirm the issue by running the following command in a notebook:
%sh virtualenv --no-site-packages
The result is an error message similar to the following:
usage: virtualenv [--version] [--with-traceback] [-v | -q] [--discovery {builtin}] [-p py] [--creator {builtin,cpython3-posix,venv}] [--seeder {app-data,pip}] [--no-seed] [--activators comma_separated_list] [--clear] [--system-site-packages] [--symlinks | --copies] [--download | --no-download] [--extra-search-dir d [d ...]] [--pip version] [--setuptools version] [--wheel version] [--no-pip] [--no-setuptools] [--no-wheel] [--clear-app-data] [--symlink-app-data] [--prompt prompt] [-h] dest virtualenv: error: the following arguments are required: dest
The virtualenv command does not recognize the --no-site-packages option.
Version
The problem affects all current Databricks Runtime versions, except for Databricks Runtime versions that include Conda. It affects virtualenv library version 20.0.0 and above.
Cause
This issue is caused by using a Python virtualenv library version in the Docker container that does not support the --no-site-packages option.
Databricks Runtime requires a virtualenv library that supports the --no-site-packages option. This option was removed in virtualenv library version 20.0.0 and above.
You can verify your virtualenv library version by running the following command in a notebook:
%sh virtualenv --version
Solution
You can resolve the issue by specifying a compatible version when you install the virtualenv library.
For example, setting virtualenv==16.0.0 in the Dockerfile installs virtualenv library version 16.0.0. This version of the library supports the required option.