You can use IAM session tokens with Hadoop config support to access S3 storage in Databricks Runtime 8.3 and above.
Extract the session credentials from your cluster
Extract the session credentials from your cluster.
You will need the Instance Profile from your cluster. This can be found under Advanced Options in the cluster configuration.
Use curl to display the AccessKeyId, SecretAccessKey, and Token.
%sh curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<instance-profile>
Alternatively, you can use a Python script.
%python import requests import json response = requests.get("http://169.254.169.254/latest/meta-data/iam/security-credentials/<instance-profile>") credentials = response.json() print(credentials)
The IP address should not be modified. 169.254.169.254 is a link-local address and is valid only from the instance.
Use session credentials in a notebook
You can use the session credentials by entering them into a notebook.
%python AccessKey = "<AccessKeyId>" Secret = "<SecretAccessKey>" Token = "<Token>" sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider") sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", AccessKey ) sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", Secret) sc._jsc.hadoopConfiguration().set("fs.s3a.session.token", Token)
Once the session credentials are loaded in the notebook, you can access files in the S3 bucket with a S3A URI.
%python dbutils.fs.ls("s3a://<path-to-folder>/")
Use session credentials in the cluster config
You can add the session credentials to the cluster Spark config. This makes them accessible to all notebooks on the cluster.
fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider fs.s3a.access.key <AccessKeyId> fs.s3a.secret.key <SecretAccessKey> fs.s3a.session.token <Token>