Problem
Clusters running Databricks Runtime 11.3 LTS or above terminate with a Failed to bind error message.
Fatal uncaught exception. Terminating driver. java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:6062
Cause
This can happen if multiple processes attempt to use the same port. Databricks Runtime 11.3 LTS and above use the IPython kernel (AWS | Azure | GCP) as the default REPL on port 6062.
If you have other software configured to run on the same port it can result in a conflict (for example, Datadog is usually configured on port 6062). If a conflict occurs, the driver node may fail to start.
Solution
As a workaround, you can configure the cluster to use the standard Python shell as the default REPL in the cluster's Spark config (AWS | Azure | GCP).
spark.databricks.python.defaultPythonRepl pythonshell
This prevents the cluster from using the IPython kernel. As a result, there is no port conflict and the driver node successfully starts.