Install a private PyPI repo

How to install libraries from private PyPI repositories.

Written by darshan.bargal

Last published at: March 4th, 2022

Certain use cases may require you to install libraries from private PyPI repositories.

If you are installing from a public repository, you should review the library documentation.

This article shows you how to configure an example init script that authenticates and downloads a PyPI library from a private repository.

Create init script

  1. Create (or verify) a directory to store the init script.<init-script-folder>is the name of the folder where you store your init scripts.
    dbutils.fs.mkdirs("dbfs:/databricks/<init-script-folder>/")
  2. Create the init script.
    dbutils.fs.put("/databricks/<init-script-folder>/private-pypi-install.sh","""
    #!/bin/bash
    /databricks/python/bin/pip install --index-url=https://${<repo-username>}:${<repo-password>}@<private-pypi-repo-domain-name> private-package==<version>
    """, True)
  3. Verify that your init script exists.
    display(dbutils.fs.ls("dbfs:/databricks/<init-script-folder>/private-pypi-install.sh"))

Install as a cluster-scoped init script

Install the init script that you just created as a cluster-scoped init script.

You will need the full path to the location of the script (dbfs:/databricks/<init-script-folder>/private-pypi-install.sh).

Restart the cluster

Restart your cluster after you have installed the init script.

Once the cluster starts up, verify that it successfully installed the custom library from the private PyPI repository.

If the custom library is not installed, double check the username and password that you set for the private PyPI repository in the init script.

Use the init script with a job cluster

Once you have the init script created, and verified working, you can include it in a create-job.json file when using the Jobs API to start a job cluster.

{
  "cluster_id": "1202-211320-brick1",
  "num_workers": 1,
  "spark_version": "<spark-version>",
  "node_type_id": "<node-type>",
  "cluster_log_conf": {
    "dbfs" : {
      "destination": "dbfs:/cluster-logs"
    }
  },
  "init_scripts": [ {
    "dbfs": {
      "destination": "dbfs:/databricks/<init-script-folder>/private-pypi-install.sh"
    }
  } ]
}


Was this article helpful?