When working with Python, you may want to import a custom CA certificate to avoid connection errors to your endpoints.
ConnectionError: HTTPSConnectionPool(host='my_server_endpoint', port=443): Max retries exceeded with url: /endpoint (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb73dc3b3d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Similarly, you may need custom certificates to be added to the default Java cacerts in order to access different endpoints with Apache Spark JVMs.
Instructions
To import one or more custom CA certificates to your Databricks compute, you can create an init script that adds the entire CA certificate chain to both the Linux SSL and Java default cert stores, and sets the REQUESTS_CA_BUNDLE
property.
The resulting init script can be configured as a cluster-scoped init script or a global init script.
For more information on configuring init scripts, please review the What are init scripts? (AWS | Azure | GCP) documentation.
In this example init script, PEM format CA certificates are added to the file myca.crt
which is located at /user/local/share/ca-certificates/
.
You should replace the values <first-custom-certificate-chain> and <second-custom-certificate-chain> in the example code with your custom certificate details. You can add as many custom certificates as you need.
#!/bin/bash
cat << 'EOF' > /usr/local/share/ca-certificates/myca.crt
-----BEGIN CERTIFICATE-----
<first-custom-certificate-chain>
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
<second-custom-certificate-chain>
-----END CERTIFICATE-----
EOF
update-ca-certificates
PEM_FILE="/etc/ssl/certs/myca.pem"
PASSWORD="changeit"
JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
KEYSTORE="$JAVA_HOME/lib/security/cacerts"
CERTS=$(grep 'END CERTIFICATE' $PEM_FILE| wc -l)
# To process multiple certs with keytool, you need to extract
# each one from the PEM file and import it into the Java KeyStore.
for N in $(seq 0 $(($CERTS - 1))); do
ALIAS="$(basename $PEM_FILE)-$N"
echo "Adding to keystore with alias:$ALIAS"
cat $PEM_FILE |
awk "n==$N { print }; /END CERTIFICATE/ { n++ }" |
keytool -noprompt -import -trustcacerts \
-alias $ALIAS -keystore $KEYSTORE -storepass $PASSWORD
done
echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
echo "export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
Note
To use your custom CA certificates with DBFS FUSE (AWS | Azure | GCP), add /databricks/spark/scripts/restart_dbfs_fuse_daemon.sh
to the end of your init script.
Troubleshooting
If you get a error message like bash: line : $'\r': command not found
or bash: line : warning: here-document at line 3 delimited by end-of-file (wanted `EOF')
, you may have Windows style new line characters present.
Use cat -v <file-name>
to view the file and look for any hidden characters.
If you do have unwanted new line characters present, you can use a utility like dos2unix to convert the file to a standard *nix-style format.