Problem
You want to change the minor version of Python that is included with the version of Databricks Runtime you have selected.
Info
This method only allows you to update the minor version of Python. You cannot update the major version.
For example, you can update Python 3.11.0 to Python 3.11.11.
You cannot update Python 3.11.x to Python 3.12.x.
Cause
Every version of Databricks Runtime ships with a specific version of Python. You can see the included Python version by reviewing the Databricks Runtime release notes versions and compatibility (AWS | Azure |GCP) for your selected Databricks Runtime.
Open the release notes for your selected Databricks Runtime and review the System environment section.
You may have a specific situation where you want to update the Python version, but keep your selected Databricks Runtime.
Solution
You can use a cluster-scoped init script (AWS | Azure | GCP) to install an updated version of Python on your cluster when it starts.
This example init script uses the “deadsnakes” repository to install Python and the pyenv
Github repo to install the corresponding version of pyenv
.
Info
You can use any Python repository (including an internal one) to install the Python binaries. The “deadsnakes” repository used here is one of many available Python sources.
You will need to specify the version of Python and the version of pyenv
before running the init script.
#!/bin/bash
add-apt-repository -y ppa:deadsnakes/ppa
apt-get update --allow-releaseinfo-change-origin
DEBIAN_FRONTEND=noninteractive apt -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install <python-version-to-install>
DEBIAN_FRONTEND=noninteractive apt -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" --fix-broken install
wget https://github.com/pyenv/pyenv/archive/refs/tags/<pyenv-version-to-install>.tar.gz -O pyenv.tar.gz \
&& tar -xvf pyenv.tar.gz --strip-components 1 -C /databricks/.pyenv \
&& rm pyenv.tar.gz
Important
When using standard (formerly shared) access mode clusters, you must add _PIP_USE_IMPORTLIB_METADATA=false
to the cluter's Spark config. This is required for library installations to work.
This init script also does not change the UDF Python version on standard access mode clusters, as init scripts are not applied on UDF workers.
Example - Install Python 3.11.11 and pyenv
2.5.0
This example code builds on the above sample to install the latest version of Python 3.11.x and pyenv
2.5.0.
#!/bin/bash
# install the latest python 3.11 version
add-apt-repository -y ppa:deadsnakes/ppa
apt-get update --allow-releaseinfo-change-origin
DEBIAN_FRONTEND=noninteractive apt -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install python3.11
DEBIAN_FRONTEND=noninteractive apt -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" --fix-broken install
# install pyenv 2.5.0 that supports python 3.11.11
wget https://github.com/pyenv/pyenv/archive/refs/tags/v2.5.0.tar.gz -O pyenv.tar.gz \
&& tar -xvf pyenv.tar.gz --strip-components 1 -C /databricks/.pyenv \
&& rm pyenv.tar.gz