Cluster fails to launch with a Bootstrap Timeout error

Verify allowlisting of necessary services and correct configuration of your VPC/VNet.

Written by parth.sundarka

Last published at: January 20th, 2025

Problem

Clusters in your workspace are failing to launch with a Bootstrap Timeout error message.

 

Cause

This issue can occur due to any one of the following reasons:

  • Firewall restrictions
  • Firewall throttling
  • Incorrect virtual network configuration 

 

Solution

  • Check the Databricks service status page (AWSAzureGCP) for any known issues in your region.
  • Verify that all required Databricks FQDNs and IPs are allowlisted in your VPC. Make sure to allowlist: 
    • Control plane IPs
    • Metastore
    • Artifact blob storage primary
    • Artifact blob storage secondary
    • System tables storage
    • Log blob storage
    • The event hubs endpoint from the documentation. 
    • For more details, review the IP addresses and domains for Databricks services and assets (AWSAzureGCP) documentation.
  • Verify that you have met all the requirements for customer-managed VPC/VNet for all of the subnets you're using with Databricks. For more details, review the Configure a customer-managed VPC/Azure virtual network (VNet injection) (AWSAzureGCP) documentation.
  • Check with your internal infrastructure team to determine if there are any throttling issues observed in the firewall.

 

In AWS, if your workspace has a private link enabled, review the details in the Enable private connectivity using AWS PrivateLink documentation to verify that the DNS is enabled and it is in the approved state.