Error when installing Cartopy on a cluster

Problem

You are trying to install Cartopy on a cluster and you receive a ManagedLibraryInstallFailed error message.

java.lang.RuntimeException: ManagedLibraryInstallFailed: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, cartopy==0.17.0, --disable-pip-version-check) exited with code 1.   ERROR: Command errored out with exit status 1:
   command: /databricks/python3/bin/python3.7 /databricks/python3/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjoliwaky
       cwd: /tmp/pip-install-t324easa/cartopy
  Complete output (3 lines):
  setup.py:171: UserWarning: Unable to determine GEOS version. Ensure you have 3.3.3 or later installed, or installation may fail.
    '.'.join(str(v) for v in GEOS_MIN_VERSION), ))
  Proj 4.9.0 must be installed.
  ----------------------------------------
ERROR: Command errored out with exit status 1: /databricks/python3/bin/python3.7 /databricks/python3/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpjoliwaky Check the logs for full command output.
 for library:PythonPyPiPkgId(cartopy,Some(0.17.0),None,List()),isSharedLibrary=false

Cause

Cartopy has dependencies on libgeos 3.3.3 and above and libproj 4.9.0. If libgeos and libproj are not installed, Cartopy fails to install.

Solution

Configure a cluster-scoped init script to automatically install Cartopy and the required dependencies.

  1. Create the base directory to store the init script in, if the base directory does not exist. Here, use dbfs:/databricks/<directory> as an example.

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. Create the script and save it to a file.

    dbutils.fs.put("dbfs:/databricks/<directory>/cartopy.sh","""
    #!/bin/bash
    sudo apt-get install libgeos++-dev -y
    sudo apt-get install libproj-dev -y
    /databricks/python/bin/pip install Cartopy
    """,True)
    
  3. Check that the script exists.

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/cartopy.sh"))
    
  4. On the cluster configuration page, click the Advanced Options toggle.

  5. At the bottom of the page, click the Init Scripts tab.

    Image of DBFS init script setup UI
  6. In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. Restart the cluster.