UnknownHostException on cluster launch

Troubleshoot an UnknownHostException on cluster launch. This is often a DNS configuration issue.

Written by arnab.saha

Last published at: December 8th, 2022

Problem

When you launch an Azure Databricks cluster, you get an UnknownHostException error.

You may also get one of the following error messages:

  • Error: There was an error in the network configuration. databricks_error_message: Could not access worker artifacts.
  • Error: Temporary failure in name resolution.
  • Internal error message: Failed to launch spark container on instance XXX. Exception: Could not add container for XXX with address X.X.X.X.mysql.database.azure.com: Temporary failure in name resolution.

Cause

These errors indicate an issue with DNS settings.

  • Primary DNS could be down or unresponsive.
  • Artifacts are not being resolved, which results in the cluster launch failure.
  • You may have a host record listing the artifact public IP as static, but it has changed.

Solution

Identify a working DNS server and update the DNS entry on the cluster.

  1. Start a standalone Azure VM and verify that the artifacts blob storage account is reachable from the instance.
    `telnet dbartifactsprodeastus.blob.core.windows.net 443`.
  2. Verify that you can reach your primary DNS server from a notebook by running a ping command.
  3. If your DNS server is not responding, try to reach your secondary DNS server from a notebook by running a ping command.
  4. Launch a Web Terminal from the cluster workspace.
  5. Edit the /etc/resolv.conf file on the cluster.
  6. Update the nameserver value with your working DNS server.
  7. Save the changes to the file.
  8. Restart systemd-resolved.
    $ sudo systemctl restart systemd-resolved.service
Delete

Info

This is a temporary change to the DNS and will be lost on cluster restart. After verifying that the custom DNS settings are correct, you can configure custom DNS settings using dnsmasq to make the change permanent.

Further troubleshooting

If you are still having DNS issues, you should try the following steps:

  • Verify that port 43 (used for whois) and port 53 (used for DNS) are open in your firewall.
  • Add the Azure recursive resolver (168,.63.129.16) to the default DNS forwarder. Review the VMs and role instances documentation for more information.
  • Verify that nslookup results are identical between your laptop and the default DNS. If there is a mistmatch, your DNS server may have an incorrect host record.
  • Verify that everything works with a default Azure DNS server. If it works with Azure DNS, but fails with your custom DNS, your DNS admin should review your DNS server settings.


Was this article helpful?