Introduction
You need to install Python libraries from a private PyPI repository that requires authentication.
Instructions
Important
Before you begin, Databricks recommends never hardcoding credentials. Instead, use the secrets utility to manage credentials securely. For more information, refer to the Secret management (AWS | Azure | GCP) documentation.
To install a package from a private PyPI repository using a notebook command, either configure a cluster-wide index URL or install the package using a notebook cell.
Method 1: Configure a cluster-wide index URL
If you want persistent, cluster-wide configuration (so all pip install
commands always use your private repo by default), you can configure pip itself. This method ensures all subsequent pip install calls on the cluster will automatically use your private index URL without specifying -i
each time.
You can do this by either using pip config
directly, or creating an init script using pip config
.
Option A: Use the pip config command directly
Set these globally using pip config in a terminal.
pip config --global set global.index-url https://<user>:<password>@your-private-repo-url
pip config --global set global.extra-index-url <any-other-index-url>
pip config --global set global.cert /etc/pip-certificates/cert.pem
Option B: Create an init script using pip config
The following code provides an example of creating the init script using pip config
to persist the settings in your pip config file (for example,~/.config/pip/pip.conf
or an equivalent). This approach is useful if you don’t want to modify /etc/pip.conf
.
#!/bin/bash
# Make sure your CA cert is on the node
mkdir -p /etc/pip-certificates
cat <<EOF >/etc/pip-certificates/cert.pem
-----BEGIN CERTIFICATE-----
<your-ca-bundle-here>
-----END CERTIFICATE-----
EOF
# Configure pip globally (using INIT script)
Optionally add additional configs as needed
pip config --global set global.index-url https://<user>:<password>@your-private-repo-url
pip config --global set global.extra-index-url <any-other-index-url>
pip config --global set global.cert /etc/pip-certificates/cert.pem
Method 2: Install the package using a notebook cell
To install the package using a notebook cell, you can either run code in a notebook or use an init script.
Option A: Run code in a notebook
Before you run the code, replace the following variables with your own values.
-
<your-secret-scope>
and<your-key>
with your Databricks secret scope and key. -
<your-private-repo-base-URL>
in the format,artifactory1.example.com
. -
<package-name>
with the name of the package you want to install. -
<your-repo-username>
in the required format. It may be your email, or it can be a string.
%python
user = "<your-repo-username>"
pwd = dbutils.secrets.get("<your-secret-scope>", "<your-key>")
repo_url = f"https://{user}:{pwd}@<your-private-repo-base-URL>"
# Install the package using pip magic
%pip install -i {repo_url} <package-name>==<version>
If SSL issues occur, such as certificate or hostname trust issues, add the --trusted-host argument.
%sh
%pip install -i {repo_url} --trusted-host <package-name>
Option B: Use an init script
To use an init script, review the Install a private PyPI repo KB article.
Troubleshooting
If the installation fails or behaves unexpectedly, use the following verbose logging option -vvv
to get detailed output.
%sh
%pip install -i {repo_url} <package-name> -vvv
This command prints detailed diagnostic information to help identify authentication errors, network issues, or pip incompatibilities.
Common Issues
Symptom |
Possible cause |
Suggested fix |
401 Unauthorized |
Invalid credentials or incorrect secret scope/key |
Verify the secret scope and key name, then test with manually copied credentials. |
Timeout / SSL error |
Repository certificate not trusted |
Add |
Package not found |
Incorrect package name or repository URL |
Check the private repo URL, then validate the package name and version. |