Python REPL fails to start in Docker
Problem
When you use a Docker container that includes prebuilt Python libraries, Python commands fail and the virtual environment is not created. The following error message is visible in the driver logs.
20/02/29 16:38:35 WARN PythonDriverWrapper: Failed to start repl ReplId-5b591-0ce42-78ef3-7
java.io.IOException: Cannot run program "/local_disk0/pythonVirtualEnvDirs/virtualEnv-56a5be60-3e71-486f-ac04-08e8f2491032/bin/python" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.util.Utils$.executeCommand(Utils.scala:1367)
at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:1393)
at org.apache.spark.util.Utils$.executePythonAndGetOutput(Utils.scala:
…
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 17 more
You can confirm the issue by running the following command in a notebook:
%sh virtualenv --no-site-packages
The result is an error message similar to the following:
usage: virtualenv [--version] [--with-traceback] [-v | -q] [--discovery {builtin}] [-p py] [--creator {builtin,cpython3-posix,venv}] [--seeder {app-data,pip}] [--no-seed] [--activators comma_separated_list] [--clear]
[--system-site-packages] [--symlinks | --copies] [--download | --no-download] [--extra-search-dir d [d ...]] [--pip version] [--setuptools version] [--wheel version] [--no-pip] [--no-setuptools] [--no-wheel]
[--clear-app-data] [--symlink-app-data] [--prompt prompt] [-h]
dest
virtualenv: error: the following arguments are required: dest
The virtualenv
command does not recognize the --no-site-packages
option.
Version
The problem affects all current Databricks Runtime versions, except for Databricks Runtime versions that include Conda. It affects virtualenv
library version 20.0.0 and above.
Cause
This issue is caused by using a Python virtualenv
library version in the Docker container that does not support the --no-site-packages
option.
Databricks Runtime requires a virtualenv
library that supports the --no-site-packages
option. This option was removed in virtualenv
library version 20.0.0 and above.
You can verify your virtualenv
library version by running the following command in a notebook:
%sh virtualenv --version