Problem: Error When Installing pyodbc on a Cluster

Problem

One of the following errors occurs when you use pip to install the pyodbc library.

java.lang.RuntimeException: Installation failed with message: Collecting pyodbc
"Library installation is failing due to missing dependencies. sasl and thrift_sasl are optional dependencies for SASL or Kerberos support"

Cause

Although sasl and thrift_sasl are optional dependencies for SASL or Kerberos support, they need to be present for pyodbc installation to succeed.

Solution

Set up solution in a single notebook

  1. In the notebook, check the version of thrift and upgrade to the latest version.

    %sh
    pip list | egrep 'thrift-sasl|sasl'
    pip install --upgrade thrift
    
  2. Ensure that dependent packages are installed.

    %sh dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
    
  3. Install nnixodbc before installing pyodbc.

    %sh sudo apt-get -y install unixodbc-dev libsasl2-dev gcc python-dev
    

Set up solution as a cluster-scoped init script

You can put these commands into a single init script and attach it to the cluster. This ensures that the dependent libraries for pyodbc are installed before the cluster starts.

  1. Create the base directory to store the init script in, if the base directory does not exist. Here, use dbfs:/databricks/<directory> as an example.

    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
    
  2. Create the script and save it to a file.

    dbutils.fs.put("dbfs:/databricks/<directory>/tornado.sh","""
    #!/bin/bash
    pip list | egrep 'thrift-sasl|sasl'
    pip install --upgrade thrift
    dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
    sudo apt-get -y install unixodbc-dev libsasl2-dev gcc python-dev
    """,True)
    
  3. Check that the script exists.

    display(dbutils.fs.ls("dbfs:/databricks/<directory>/tornado.sh"))
    
  4. On the cluster configuration page, click the Advanced Options toggle.

  5. At the bottom of the page, click the Init Scripts tab.

    ../_images/db-init-script-ui.png
  6. In the Destination drop-down, select DBFS, provide the file path to the script, and click Add.

  7. Restart the cluster

For more details about cluster-scoped init scripts, see Cluster-scoped init scripts.