PyPMML fails with Could not find py4j jar error

Written by arjun.kaimaparambilrajan

Last published at: April 30th, 2025
Delete

Note

This KB article is for Databricks Runtime 10.4 LTS. If you use Databricks Runtime 11.3 LTS or above, refer to the Notebook or workflow fails with “Error : Py4JError: Could not find py4j jar at” error after trying to install PyPMML on a cluster KB article instead.


Problem

PyPMML is a Python PMML scoring library.

After installing PyPMML in a Databricks cluster, it fails with a Py4JError: Could not find py4j jar error.

%python

from pypmml import Model
modelb = Model.fromFile('/dbfs/shyam/DecisionTreeIris.pmml')

Error : Py4JError: Could not find py4j jar at

Cause

This error occurs due to a dependency on the default Py4J library.

  • Databricks Runtime 5.0-6.6 uses Py4J 0.10.7.
  • Databricks Runtime 7.0 and above uses Py4J 0.10.9.

The default Py4J library is installed to a different location than a standard Py4J package. As a result, when PyPMML attempts to invoke Py4J from the default path, it fails.

Solution

Setup a cluster-scoped init script that copies the required Py4J jar file into the expected location.

  1. Use pip to install the version of Py4J that corresponds to your Databricks Runtime version.
    For example, in Databricks Runtime 6.5 run pip install py4j==<0.10.7> in a notebook in install Py4J 0.10.7 on the cluster.
  2. Run find /databricks/ -name "py4j*jar" in a notebook to confirm the full path to the Py4J jar file. It is usually located in a path similar to /databricks/python3/share/py4j/.
  3. Manually copy the Py4J jar file from the install path to the DBFS path /dbfs/py4j/.
  4. Run the following code snippet in a Python notebook to create the install-py4j-jar.shinit script. Make sure the version number of Py4J listed in the snippet corresponds to your Databricks Runtime version.
    %python
    
    dbutils.fs.put("/databricks/init-scripts/install-py4j-jar.sh", """
    
    #!/bin/bash
    echo "Copying at `date`"
    mkdir -p /share/py4j/ /current-release/
    cp /dbfs/py4j/py4j<version number>.jar /share/py4j/
    cp /dbfs/py4j/py4j<version number>.jar /current-release/
    echo "Copying completed at `date`"
    
    """, True)
  5. Attach the install-py4j-jar.sh init script to your cluster, following the instructions in configure a cluster-scoped init script (AWS | Azure | GCP).
  6. Restart the cluster.
  7. Verify that PyPMML works as expected.