Problem
When attempting to connect to Snowflake from Databricks environments, trying to read from or write to Snowflake tables, or execute Snowflake queries, you receive an error message.
Connection to Snowflake timed out.
Cause
Connectivity issues between Databricks and Snowflake can arise from various factors, including incorrect network configurations such as firewall rules and VPC settings, DNS resolution problems, misconfigured proxy settings, account identifier mismatches, high network latency, and SSL/TLS certificate validation issues.
Solution
To effectively diagnose these challenges, Snowflake offers the Snowflake Connectivity Diagnostic (SnowCD) utility, which you can integrate into your Databricks environment for comprehensive network testing.
Get SnowCD ready in Databricks
First, create an init script to download and install SnowCD on your Databricks cluster.
Open a new text file in your workspace, and paste in the following code snippet.
#!/bin/bash
wget https://sfc-repo.snowflakecomputing.com/snowcd/linux_amd64/latest/snowcd -O /usr/local/bin/snowcd
chmod +x /usr/local/bin/snowcd
Next, save this script in your workspace files as install_snowcd.sh
. Upload the script file directly to the desired location in your workspace, ensuring the path follows the format “/Workspace/<folder>/install_snowcd.sh”
.
Then, configure your Databricks cluster to use the init script.
- Go to your cluster and click it to open your configuration UI.
- Under Advanced Options, select Init Scripts
- Add the path to your script:
/Workspace/<folder>/install_snowcd.sh
- Save your cluster settings.
Obtain Snowflake endpoints
This step is like getting a map of all the places Snowflake lives on the internet.
First, connect to your Snowflake account using the web interface or SnowSQL.
Next, execute the following SQL query.
sql
SELECT SYSTEM$ALLOWLIST();
Note
If you're using a private link connection, use SYSTEM$ALLOWLIST_PRIVATELINK()
instead.
Last, save the JSON output to a file named allowlist.json
in your Databricks workspace.
Run SnowCD in Databricks
Use SnowCD to check the connection.
First, create a new notebook in Databricks and use the following Python code to run SnowCD.
import subprocess
import json
# Path to the allowlist.json file in Databricks
allowlist_path = "/dbfs/path/to/your/allowlist.json"
# Run SnowCD
result = subprocess.run(["snowcd", allowlist_path], capture_output=True, text=True)
# Print the output
print(result.stdout)
# Check for errors
if result.returncode != 0:
print("Error occurred:")
print(result.stderr)
else:
print("All checks passed successfully")
After running the test, you'll see output in the result. If all checks pass, you'll see "All checks passed successfully"
.
If there are issues, SnowCD will provide detailed error messages about which endpoints couldn't be reached.
Troubleshoot based on SnowCD output
- If SnowCD reports DNS lookup failures, work with your internal network team to ensure proper DNS resolution for Snowflake endpoints.
- For connection failures, review and update firewall rules to allow traffic to Snowflake IP addresses and ports identified in the
allowlist.json
file. - If using a proxy, ensure it's correctly configured in your Databricks environment and can handle Snowflake connections.
- For SSL/TLS-related errors, verify that your Databricks cluster supports the required TLS version and cipher suites for Snowflake.
- For account identifier issues, ensure you're using the correct Snowflake account identifier in your connection strings.
For more information, refer to Snowflake’s SnowCD (Connectivity Diagnostic Tool) documentation.