Access S3 with temporary session credentials

Extract IAM session credentials and use them to access S3 storage via S3A URI. Requires Databricks Runtime 8.3 and above.

Written by Gobinath.Viswanathan

Last published at: May 16th, 2022

You can use IAM session tokens with Hadoop config support to access S3 storage in Databricks Runtime 8.3 and above.

Delete

Info

You cannot mount the S3 path as a DBFS mount when using session credentials. You must use the S3A URI.

Extract the session credentials from your cluster

Extract the session credentials from your cluster.

You will need the Instance Profile from your cluster. This can be found under Advanced Options in the cluster configuration.

Use curl to display the AccessKeyId, SecretAccessKey, and Token.

%sh

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<instance-profile>

Alternatively, you can use a Python script.

%python

import requests
import json
response = requests.get("http://169.254.169.254/latest/meta-data/iam/security-credentials/<instance-profile>")
credentials = response.json()
print(credentials)

The IP address should not be modified. 169.254.169.254 is a link-local address and is valid only from the instance.

Delete

Info

You can only extract a session token from a standard cluster. This will not work on a high concurrency cluster.

Use session credentials in a notebook

You can use the session credentials by entering them into a notebook.

%python

AccessKey = "<AccessKeyId>"
Secret = "<SecretAccessKey>"
Token = "<Token>"
sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", AccessKey )
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", Secret)
sc._jsc.hadoopConfiguration().set("fs.s3a.session.token", Token)

Once the session credentials are loaded in the notebook, you can access files in the S3 bucket with a S3A URI.

%python

dbutils.fs.ls("s3a://<path-to-folder>/")

Use session credentials in the cluster config

You can add the session credentials to the cluster Spark config. This makes them accessible to all notebooks on the cluster.

fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
fs.s3a.access.key <AccessKeyId>
fs.s3a.secret.key <SecretAccessKey>
fs.s3a.session.token <Token>