Problem
You configure the following proxy settings in a notebook.
HTTP_PROXY=http://10.24.xxx.xxx:443
HTTPS_PROXY=https://10.24.xxx.xxx:443
You then encounter issues such as the following.
- Error messages indicating failed connections to external repositories.
- Failure to download libraries or dependencies.
- Inability to access certain websites or services due to proxy configuration issues.
Cause
Configuring proxy settings at the cluster level prevents the required fully qualified domain names (FQDNs) from being allowlisted in the firewall.
You may also, or instead, have Apache Spark configurations or incorrect proxy settings. The specific cause depends on the technical details of your Databricks environment.
Solution
Allowlist the required FQDNs, remove unnecessary Spark configuration properties, then configure and test your proxy settings. Last, verify connectivity.
Allowlist required FQDNs
Ensure that the required FQDNs are allowlisted in your organization’s proxy server config to allow access to external repositories. Contact the team in your organization responsible for the proxy servers.
Remove unnecessary properties
Review your cluster Spark config properties and remove unnecessary ones. For details on how to modify Spark configs, refer to the “Spark configuration” section of the Compute configuration reference (AWS | Azure | GCP) documentation.
Configure proxy settings
Configure the proxy settings at the cluster level. Add the following configs to your cluster settings.
Replace *.example.com
with the hostname of the service excluded from using the proxy. For example, *.snowflakecomputing.com
for Snowflake services.
spark.driver.extraJavaOptions="
-Dhttp.proxyHost=10.24.132.XXX -Dhttp.proxyPort=80 -Dhttps.proxyHost=10.24.132.XXX -Dhttps.proxyPort=443
-Dhttp.nonProxyHosts=*.example.com -Dhttps.nonProxyHosts=*.example.com"
spark.executor.extraJavaOptions="
-Dhttp.proxyHost=10.24.132.XXX -Dhttp.proxyPort=80 -Dhttps.proxyHost=10.24.132.XXX -Dhttps.proxyPort=443 -Dhttp.nonProxyHosts=*.example.com -Dhttps.nonProxyHosts=*.example.com"
Test proxy access
Do a telnet test to check if your cluster has access to your proxy. The following code provides an example.
telnet 10.24.132.XXX 443
Verify connectivity
After you’ve completed the above adjustments and checks, verify that connectivity to external repositories works correctly by running the following two netcat test commands and curl command.
netcat test directly to the target hostname
nc -vz <hostname> 443
netcat test with proxy
nc -X connect -x <proxy-address>:<proxy-port> <target-host> <target-port>
The following code provides an example.
nc -X connect -x 127.0.0.1:8080 google.com 443
curl test with proxy
curl --proxy http://<proxy-address>:<proxy-port> https://<endpoint-of-interest>