Error when installing Cartopy
on a cluster
Problem
You are trying to install Cartopy
on a cluster and you receive a ManagedLibraryInstallFailed
error message.
java.lang.RuntimeException: ManagedLibraryInstallFailed: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, cartopy==0.17.0, --disable-pip-version-check) exited with code 1. ERROR: Command errored out with exit status 1:
command: /databricks/python3/bin/python3.7 /databricks/python3/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjoliwaky
cwd: /tmp/pip-install-t324easa/cartopy
Complete output (3 lines):
setup.py:171: UserWarning: Unable to determine GEOS version. Ensure you have 3.3.3 or later installed, or installation may fail.
'.'.join(str(v) for v in GEOS_MIN_VERSION), ))
Proj 4.9.0 must be installed.
----------------------------------------
ERROR: Command errored out with exit status 1: /databricks/python3/bin/python3.7 /databricks/python3/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjoliwaky Check the logs for full command output.
for library:PythonPyPiPkgId(cartopy,Some(0.17.0),None,List()),isSharedLibrary=false
Cause
Cartopy
has dependencies on libgeos
3.3.3 and above and libproj
4.9.0. If libgeos
and libproj
are not installed, Cartopy
fails to install.
Solution
Configure a cluster-scoped init script to automatically install Cartopy
and the required dependencies.
Create the base directory to store the init script in, if the base directory does not exist. Here, use
dbfs:/databricks/<directory>
as an example.dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
Create the script and save it to a file.
dbutils.fs.put("dbfs:/databricks/<directory>/cartopy.sh",""" #!/bin/bash sudo apt-get install libgeos++-dev -y sudo apt-get install libproj-dev -y /databricks/python/bin/pip install Cartopy """,True)
Check that the script exists.
display(dbutils.fs.ls("dbfs:/databricks/<directory>/cartopy.sh"))
On the cluster configuration page, click the Advanced Options toggle.
At the bottom of the page, click the Init Scripts tab.
In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.
Restart the cluster.