Turbodbc is a Python module that uses the ODBC interface to access relational databases.
It has dependencies on libboost-all-dev, unixodbc-dev, and python-dev packages, which need to be installed in order.
You can install these manually, or you can use an init script to automate the install.
Create the init script
Run this sample script in a notebook to create the init script on your cluster.
%python dbutils.fs.mkdirs("dbfs:/<path-to-init-script>") dbutils.fs.put("dbfs:/<path-to-init-script>/turbodbc_install.sh", """ #!/bin/bash #install dependent packages sudo apt-get -y install libboost-all-dev unixodbc-dev python-dev pip install turbodbc==4.1.1 """,True)
Remember the path to the init script. You will need it when configuring your cluster.
Configure the init script
Follow the documentation to configure a cluster-scoped init script (AWS | Azure | GCP).
Specify the path to the init script. Use the same path that you used in the sample script.
After configuring the init script, restart the cluster.