Cluster fails with Fatal uncaught exception error. Failed to bind.

If other software uses port 6062, it can conflict with the IPython kernel REPL and prevent the driver node from starting.

Written by simran.arora

Last published at: July 17th, 2023

Problem

Clusters running Databricks Runtime 11.3 LTS or above terminate with a Failed to bind error message.

Fatal uncaught exception. Terminating driver.
java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:6062

Cause

This can happen if multiple processes attempt to use the same port. Databricks Runtime 11.3 LTS and above use the IPython kernel (AWS | Azure | GCP) as the default REPL on port 6062.

If you have other software configured to run on the same port it can result in a conflict (for example, Datadog is usually configured on port 6062). If a conflict occurs, the driver node may fail to start.

Solution

As a workaround, you can configure the cluster to use the standard Python shell as the default REPL in the cluster's Spark config (AWS | Azure | GCP).

spark.databricks.python.defaultPythonRepl pythonshell


This prevents the cluster from using the IPython kernel. As a result, there is no port conflict and the driver node successfully starts.