Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Libraries (Azure)

These articles can help you manage libraries in Databricks.

22 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • Azure
  • Libraries (Azure)

Cannot import module in egg library

Problem You try to install an egg library to your cluster and it fails with a message that the a module in the library cannot be imported. Even a simple import fails. import sys egg_path='/dbfs/<path-to-egg-file>/<egg-file>.egg' sys.path.append(egg_path) import shap_master Cause This error message occurs due to the way the library is pac...

Last updated: May 11th, 2022 by xin.wang

Cannot import TabularPrediction from AutoGluon

Problem You are trying to import TabularPrediction from AutoGluon, but are getting an error message. ImportError: cannot import name 'TabularPrediction' from 'autogluon' (unknown location) This happens when AutoGluon is installed via a notebook or as a cluster-installed library (AWS | Azure | GCP). You can reproduce the error by running the import c...

Last updated: May 11th, 2022 by kavya.parag

Latest PyStan fails to install on Databricks Runtime 6.4

Problem You are trying to install the PyStan PyPi package on a Databricks Runtime 6.4 Extended Support cluster and get a ManagedLibraryInstallFailed error message. java.lang.RuntimeException: ManagedLibraryInstallFailed: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, pystan, --disable-pip-version-check) exited wit...

Last updated: May 11th, 2022 by rakesh.parija

Library unavailability causing job failures

Problem You are launching jobs that import external libraries and get an Import Error. When a job causes a node to restart, the job fails with the following error message: ImportError: No module named XXX Cause The Cluster Manager is part of the Databricks service that manages customer Apache Spark clusters. It sends commands to install Python and R...

Last updated: May 11th, 2022 by Adam Pavlacka

How to correctly update a Maven library in Databricks

Problem You make a minor update to a library in the repository, but you don’t want to change the version number because it is a small change for testing purposes. When you attach the library to your cluster again, your code changes are not included in the library. Cause One strength of Databricks is the ability to install third-party or custom libra...

Last updated: May 11th, 2022 by Adam Pavlacka

Init script fails to download Maven JAR

Problem You have an init script that is attempting to install a library via Maven, but it fails when trying to download a JAR. https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/0.4.1/rapids-4-spark_2.12-0.4.1.jar%0D Resolving repo1.maven.org (repo1.maven.org)... 151.101.248.209 Connecting to repo1.maven.org (repo1.maven.org)|151.101.248....

Last updated: May 11th, 2022 by arvind.ravish

Install package using previous CRAN snapshot

Problem You are trying to install a library package via CRAN, and are getting a Library installation failed for library due to infra fault error message. Library installation failed for library due to infra fault for Some(cran { package: "<name-of-package>" } ). Error messages: java.lang.RuntimeException: Installation failed with message: Erro...

Last updated: May 11th, 2022 by darshan.bargal

Install PyGraphViz

PyGraphViz Python libraries are used to plot causal inference networks. If you try to install PyGraphViz as a standard library, it fails due to dependency errors. PyGraphViz has the following dependencies: python3-dev graphviz libgraphviz-dev pkg-config Install via notebook Install the dependencies with apt-get.%sh sudo apt-get install -y python3-de...

Last updated: May 11th, 2022 by pavan.kumarchalamcharla

Install Turbodbc via init script

Turbodbc is a Python module that uses the ODBC interface to access relational databases. It has dependencies on libboost-all-dev, unixodbc-dev, and python-dev packages, which need to be installed in order. You can install these manually, or you can use an init script to automate the install. Create the init script Run this sample script in a noteboo...

Last updated: May 11th, 2022 by John.Lourdu

Cannot uninstall library from UI

Problem Usually, libraries can be uninstalled in the Clusters UI. If the checkbox to select the library is disabled, then it’s not possible to uninstall the library from the UI. Cause If you create a library using REST API version 1.2 and if auto-attach is enabled, the library is installed on all clusters. In this scenario, the Clusters UI checkbox ...

Last updated: May 11th, 2022 by Adam Pavlacka

Error when installing Cartopy on a cluster

Problem You are trying to install Cartopy on a cluster and you receive a ManagedLibraryInstallFailed error message. java.lang.RuntimeException: ManagedLibraryInstallFailed: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, cartopy==0.17.0, --disable-pip-version-check) exited with code 1.   ERROR: Command errored out ...

Last updated: May 11th, 2022 by prem.jayaraj

Error when installing pyodbc on a cluster

Problem One of the following errors occurs when you use pip to install the pyodbc library. java.lang.RuntimeException: Installation failed with message: Collecting pyodbc "Library installation is failing due to missing dependencies. sasl and thrift_sasl are optional dependencies for SASL or Kerberos support" Cause Although sasl and thrift_sasl are o...

Last updated: May 11th, 2022 by Adam Pavlacka

Libraries fail with dependency exception

Problem You have a Python function that is defined in a custom egg or wheel file and also has dependencies that are satisfied by another customer package installed on the cluster. When you call this function, it returns an error that says the requirement cannot be satisfied. org.apache.spark.SparkException: Process List(/local_disk0/pythonVirtualEnv...

Last updated: May 11th, 2022 by jordan.hicks

Libraries failing due to transient Maven issue

Problem Job fails because libraries cannot be installed. Library resolution failed. Cause: java.lang.RuntimeException: Cannot download some libraries due to transient Maven issue. Please try again later Cause After a Databricks upgrade, your cluster attempts to download any required libraries from Maven. After downloading, the libraries are stored a...

Last updated: May 11th, 2022 by dayanand.devarapalli

Reading .xlsx files with xlrd fails

Problem You are have xlrd installed on your cluster and are attempting to read files in the Excel .xlsx format when you get an error. XLRDError: Excel xlsx file; not supported Cause xlrd 2.0.0 and above can only read .xls files. Support for .xlsx files was removed from xlrd due to a potential security vulnerability. Solution Use openpyxl to open .xl...

Last updated: May 12th, 2022 by prakash.jha

Remove Log4j 1.x JMSAppender and SocketServer classes from classpath

Databricks recently published a blog on Log4j 2 Vulnerability (CVE-2021-44228) Research and Assessment. Databricks does not directly use a version of Log4j known to be affected by this vulnerability within the Databricks platform in a way we understand may be vulnerable. Databricks also does not use the affected classes from Log4j 1.x with known vul...

Last updated: May 16th, 2022 by Adam Pavlacka

Replace a default library jar

Databricks includes a number of default Java and Scala libraries. You can replace any of these libraries with another version by using a cluster-scoped init script to remove the default library jar and then install the version you require. Warning Removing default libraries and installing new versions may cause instability or completely break your D...

Last updated: May 16th, 2022 by ram.sankarasubramanian

Python command fails with AssertionError: wrong color format

Problem You run a Python notebook and it fails with an AssertionError: wrong color format message. An example stack trace:   File "/local_disk0/tmp/1599775649524-0/PythonShell.py", line 39, in <module>     from IPython.nbconvert.filters.ansi import ansi2html   File "<frozen importlib._bootstrap>", line 983, in _find_and_load   File "<...

Last updated: May 16th, 2022 by John.Lourdu

PyPMML fails with Could not find py4j jar error

Problem PyPMML is a Python PMML scoring library. After installing PyPMML in a Databricks cluster, it fails with a Py4JError: Could not find py4j jar error. %python from pypmml import Model modelb = Model.fromFile('/dbfs/shyam/DecisionTreeIris.pmml') Error : Py4JError: Could not find py4j jar at Cause This error occurs due to a dependency on the defa...

Last updated: May 16th, 2022 by arjun.kaimaparambilrajan

TensorFlow fails to import

Problem You have TensorFlow installed on your cluster. When you try to import TensorFlow, it fails with an Invalid Syntax or import error. Cause The version of protobuf installed on your cluster is not compatible with your version of TensorFlow. Solution Use a cluster-scoped init script to install TensorFlow with matching versions of NumPy and proto...

Last updated: May 16th, 2022 by kavya.parag

Verify the version of Log4j on your cluster

Databricks recently published a blog on Log4j 2 Vulnerability (CVE-2021-44228) Research and Assessment. Databricks does not directly use a version of Log4j known to be affected by this vulnerability within the Databricks platform in a way we understand may be vulnerable. If you are using Log4j within your cluster (for example, if you are processing ...

Last updated: May 16th, 2022 by Adam Pavlacka

Apache Spark jobs fail with Environment directory not found error

Problem After you install a Python library (via the cluster UI or by using pip), your Apache Spark jobs fail with an Environment directory not found error message. org.apache.spark.SparkException: Environment directory not found at /local_disk0/.ephemeral_nfs/cluster_libraries/python Cause Libraries are installed on a Network File System (NFS) on th...

Last updated: July 1st, 2022 by Adam Pavlacka


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0