Problem
Trying to run any notebook cell returns a Failure starting repl.
error message.
Failure starting repl. Try detaching and re-attaching the notebook.
When you review the stack trace it identifies a problem with ipykernel
as well as highlighting issues with Pandas check_dependencies
and require_minimum_pandas_version
.
Example stack trace
java.lang.Exception: Unable to start python kernel for ReplId-XXXXX-XXXXX-XXXXX-X, kernel exited with exit code 1.
----- stdout -----
------------------
----- stderr -----
Traceback (most recent call last):
File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 19, in <module>
from dbruntime.DatasetInfo import UserNamespaceCommandHook, UserNamespaceDict
File "/databricks/python_shell/dbruntime/DatasetInfo.py", line 8, in <module>
from pyspark.sql.connect.dataframe import DataFrame as ConnectDataFrame
File "/databricks/spark/python/pyspark/sql/connect/dataframe.py", line 24, in <module>
check_dependencies(__name__)
File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 34, in check_dependencies
require_minimum_pandas_version()
File "/databricks/spark/python/pyspark/sql/pandas/utils.py", line 29, in require_minimum_pandas_version
import pandas
File "/databricks/python/lib/python3.10/site-packages/pandas/__init__.py", line 22, in <module>
from pandas.compat import is_numpy_dev as _is_numpy_dev # pyright: ignore # noqa:F401
File "/databricks/python/lib/python3.10/site-packages/pandas/compat/__init__.py", line 18, in <module>
from pandas.compat.numpy import (
File "/databricks/python/lib/python3.10/site-packages/pandas/compat/numpy/__init__.py", line 4, in <module>
from pandas.util.version import Version
File "/databricks/python/lib/python3.10/site-packages/pandas/util/__init__.py", line 2, in <module>
from pandas.util._decorators import ( # noqa:F401
File "/databricks/python/lib/python3.10/site-packages/pandas/util/_decorators.py", line 14, in <module>
from pandas._libs.properties import cache_readonly
File "/databricks/python/lib/python3.10/site-packages/pandas/_libs/__init__.py", line 13, in <module>
from pandas._libs.interval import Interval
File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
------------------
at com.databricks.backend.daemon.driver.IpykernelUtils$.startReplFailure$1(JupyterDriverLocal.scala:1609)
at com.databricks.backend.daemon.driver.IpykernelUtils$.$anonfun$startIpyKernel$3(JupyterDriverLocal.scala:1619)
at com.databricks.backend.common.util.TimeUtils$.$anonfun$retryWithExponentialBackoff0$1(TimeUtils.scala:191)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.common.util.TimeUtils$.retryWithExponentialBackoff0(TimeUtils.scala:191)
at com.databricks.backend.common.util.TimeUtils$.retryWithExponentialBackoff(TimeUtils.scala:145)
at com.databricks.backend.common.util.TimeUtils$.retryWithTimeout(TimeUtils.scala:94)
at com.databricks.backend.daemon.driver.IpykernelUtils$.startIpyKernel(JupyterDriverLocal.scala:1617)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.$anonfun$startPython$1(JupyterDriverLocal.scala:1314)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry(JupyterDriverLocal.scala:1237)
at com.databricks.backend.daemon.driver.JupyterDriverLocal$$anonfun$com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry$1.applyOrElse(JupyterDriverLocal.scala:1240)
at com.databricks.backend.daemon.driver.JupyterDriverLocal$$anonfun$com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry$1.applyOrElse(JupyterDriverLocal.scala:1237)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
at scala.util.Failure.recover(Try.scala:234)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.com$databricks$backend$daemon$driver$JupyterDriverLocal$$withRetry(JupyterDriverLocal.scala:1237)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.startPython(JupyterDriverLocal.scala:1262)
at com.databricks.backend.daemon.driver.PythonDriverLocalBase.$anonfun$startPythonThreadSafe$1(PythonDriverLocalBase.scala:689)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.backend.daemon.driver.MutuallyExclusiveSections.apply(PythonDriverLocalBase.scala:82)
at com.databricks.backend.daemon.driver.PythonDriverLocalBase.startPythonThreadSafe(PythonDriverLocalBase.scala:689)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.<init>(JupyterDriverLocal.scala:623)
at com.databricks.backend.daemon.driver.PythonDriverWrapper.instantiateDriver(DriverWrapper.scala:950)
at com.databricks.backend.daemon.driver.DriverWrapper.setupRepl(DriverWrapper.scala:409)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:289)
at java.lang.Thread.run(Thread.java:750)
Cause
The Failure starting repl.
error can occur when there is an incompatible version of NumPy and/or Pandas installed on a Databricks cluster.
Note
You should verify that the stack trace references version issues with NumPy and/or Pandas as the Failure starting repl.
error message can have additional causes.
If a Python package depends on a specific NumPy and/or Pandas version, and those packages are updated to an incompatible version, an error may occur and your jobs will fail with Python errors. This can happen after a new release of NumPy and/or Pandas.
Solution
Ensure you are using versions of NumPy and/or Pandas that are compatible with your selected Databricks Runtime version. The default versions of these (and other libraries) are detailed in the Databricks Runtime release notes versions and compatibility (AWS, Azure, GCP) documentation.
If you need a specific version of NumPy and/or Pandas, you should pin the version using an init script (AWS, Azure, GCP) or by installing the specific version as a cluster library (AWS, Azure, GCP).
For example, if you want to pin NumPy 1.26.4 and Pandas 2.2.2 in an init script, you should include the following line:
pip install numpy==1.26.4 pandas==2.2.2
If you want to install NumPy 1.26.4 and Pandas 2.2.2 as cluster libraries, you should specify the versions when installing them via the API or workspace UI.
numpy==1.26.4
pandas==2.2.2