How to import a custom CA certificate

Learn how to import a custom CA certificate into your Databricks cluster for Python use.

Written by arjun.kaimaparambilrajan

Last published at: February 29th, 2024

When working with Python, you may want to import a custom CA certificate to avoid connection errors to your endpoints.

ConnectionError: HTTPSConnectionPool(host='my_server_endpoint', port=443): Max retries exceeded with url: /endpoint (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb73dc3b3d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))

Similarly, you may need custom certificates to be added to the default Java cacerts in order to access different endpoints with Apache Spark JVMs.

Instructions

To import one or more custom CA certificates to your Databricks compute, you can create an init script that adds the entire CA certificate chain to both the Linux SSL and Java default cert stores, and sets the REQUESTS_CA_BUNDLE property.

The resulting init script can be configured as a cluster-scoped init script or a global init script.

For more information on configuring init scripts, please review the What are init scripts? (AWS | Azure | GCP) documentation.

In this example init script, PEM format CA certificates are added to the file myca.crt which is located at /user/local/share/ca-certificates/

You should replace the values <first-custom-certificate-chain> and <second-custom-certificate-chain> in the example code with your custom certificate details. You can add as many custom certificates as you need.
 

#!/bin/bash

cat << 'EOF' > /usr/local/share/ca-certificates/myca.crt
-----BEGIN CERTIFICATE-----
<first-custom-certificate-chain>
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
<second-custom-certificate-chain>
-----END CERTIFICATE-----
EOF

update-ca-certificates

PEM_FILE="/etc/ssl/certs/myca.pem"
PASSWORD="changeit"
JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
KEYSTORE="$JAVA_HOME/lib/security/cacerts"

CERTS=$(grep 'END CERTIFICATE' $PEM_FILE| wc -l)

# To process multiple certs with keytool, you need to extract
# each one from the PEM file and import it into the Java KeyStore.

for N in $(seq 0 $(($CERTS - 1))); do
  ALIAS="$(basename $PEM_FILE)-$N"
  echo "Adding to keystore with alias:$ALIAS"
  cat $PEM_FILE |
    awk "n==$N { print }; /END CERTIFICATE/ { n++ }" |
    keytool -noprompt -import -trustcacerts \
            -alias $ALIAS -keystore $KEYSTORE -storepass $PASSWORD
done

echo "export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh
echo "export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" >> /databricks/spark/conf/spark-env.sh

 

Note

To use your custom CA certificates with DBFS FUSE (AWS | Azure | GCP), add /databricks/spark/scripts/restart_dbfs_fuse_daemon.sh to the end of your init script. 

 

 

Troubleshooting

If you get a error message like bash: line : $'\r': command not found or bash: line : warning: here-document at line 3 delimited by end-of-file (wanted `EOF'), you may have Windows style new line characters present.

Use cat -v <file-name> to view the file and look for any hidden characters.

If you do have unwanted new line characters present, you can use a utility like dos2unix to convert the file to a standard *nix-style format.