PyPMML is a Python PMML scoring library.
After installing PyPMML in a Databricks cluster, it fails with a
Py4JError: Could not find py4j jar error.
from pypmml import Model modelb = Model.fromFile('/dbfs/shyam/DecisionTreeIris.pmml') Error : Py4JError: Could not find py4j jar at
This error occurs due to a dependency on the default Py4J library.
- Databricks Runtime 5.0-6.6 uses Py4J 0.10.7.
- Databricks Runtime 7.0 and above uses Py4J 0.10.9.
The default Py4J library is installed to a different location than a standard Py4J package. As a result, when PyPMML attempts to invoke Py4J from the default path, it fails.
Setup a cluster-scoped init script that copies the required Py4J jar file into the expected location.
Use pip to install the version of Py4J that corresponds to your Databricks Runtime version.
For example, in Databricks Runtime 6.5 run
pip install py4j==<0.10.7>in a notebook in install Py4J 0.10.7 on the cluster.
find /databricks/ -name "py4j*jar"in a notebook to confirm the full path to the Py4J jar file. It is usually located in a path similar to
Manually copy the Py4J jar file from the install path to the DBFS path
Run the following code snippet in a Python notebook to create the
install-py4j-jar.shinit script. Make sure the version number of Py4J listed in the snippet corresponds to your Databricks Runtime version.
dbutils.fs.put("/databricks/init-scripts/install-py4j-jar.sh", """ #!/bin/bash echo "Copying at `date`" mkdir -p /share/py4j/ /current-release/ cp /dbfs/py4j/py4j<version number>.jar /share/py4j/ cp /dbfs/py4j/py4j<version number>.jar /current-release/ echo "Copying completed at `date`" """, True)
install-py4j-jar.shinit script to your cluster, following the instructions in configure a cluster-scoped init script.
Restart the cluster.
Verify that PyPMML works as expected.