While trying to access S3 data using DBFS mount or directly in Spark APIs, the command fails with an exception similar to the following:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; Request ID: XXXXXXXXXXXXX, Extended Request ID: XXXXXXXXXXXXXXXXXXX, Cloud Provider: AWS, Instance ID: XXXXXXXXXX (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XXXXXXXXXXX; S3 Extended Request ID:
Below are the common causes:
- AWS keys are used in addition to the IAM role. Using global init scripts to set the AWS keys can cause this behavior.
- The IAM role has the required permission to access the S3 data, but AWS keys are set in the Spark configuration. For example, setting spark.hadoop.fs.s3a.secret.key can conflict with the IAM role.
- Setting AWS keys at environment level on the driver node from an interactive cluster through a notebook.
- DBFS mount points were created earlier with AWS keys and now trying to access using an IAM role.
- The files are written outside Databricks, and the bucket owner does not have read permission (see Step 7: Update cross-account S3 object ACLs).
- The IAM role is not attached to the cluster.
- The IAM role with read permission was attached, but you are trying to perform a write operation. That is, the IAM role does not have adequate permission for the operation you are trying to perform.
Below are the recommendations and best practices to avoid this issue:
- Use IAM roles instead of AWS keys.
- If you are trying to switch the configuration from AWS keys to IAM roles, unmount the DBFS mount points for S3 buckets created using AWS keys and remount using the IAM role.
- Avoid using global init script to set AWS keys. Always use a cluster-scoped init script if required.
- Avoid setting AWS keys in a notebook or cluster Spark configuration.