When working with Python, you may want to import a custom CA certificate to avoid connection errors to your endpoints.
ConnectionError: HTTPSConnectionPool(host='my_server_endpoint', port=443): Max retries exceeded with url: /endpoint (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb73dc3b3d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
To import one or more custom CA certificates to your Databricks cluster:
- Create an init script that adds the entire CA chain and sets the REQUESTS_CA_BUNDLE property.
In this example, PEM format CA certificates are added to the file myca.crt which is located at /user/local/share/ca-certificates/. This file is referenced in the custom-cert.sh init script. - Once you created init script you can delete the certificates from the location.
dbutils.fs.put("/databricks/init-scripts/custom-cert.sh", """#!/bin/bash cat << 'EOF' > /usr/local/share/ca-certificates/myca.crt -----BEGIN CERTIFICATE----- <CA CHAIN 1 CERTIFICATE CONTENT> -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- <CA CHAIN 2 CERTIFICATE CONTENT> -----END CERTIFICATE----- EOF update-ca-certificates PEM_FILE="/etc/ssl/certs/myca.pem" PASSWORD="<password>" JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") KEYSTORE="$JAVA_HOME/lib/security/cacerts" CERTS=$(grep 'END CERTIFICATE' $PEM_FILE| wc -l) # To process multiple certs with keytool, you need to extract # each one from the PEM file and import it into the Java KeyStore. for N in $(seq 0 $(($CERTS - 1))); do ALIAS="$(basename $PEM_FILE)-$N" echo "Adding to keystore with alias:$ALIAS" cat $PEM_FILE | awk "n==$N { print }; /END CERTIFICATE/ { n++ }" | keytool -noprompt -import -trustcacerts \ -alias $ALIAS -keystore $KEYSTORE -storepass $PASSWORD done echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh """)
To use your custom CA certificates with DBFS FUSE (AWS | Azure | GCP), add this line to the bottom of your init script:/databricks/spark/scripts/restart_dbfs_fuse_daemon.sh
- Attach the init script to the cluster as a cluster-scoped init script (AWS | Azure | GCP).
- Restart the cluster.