DatabricksS3LoggingException errors while attempting to read a binary file in an Apache Spark structured streaming job

Disable the delta format check and ensure correct IAM role setup.

Written by kalpesh.shimpi

Last published at: August 29th, 2024

Problem

You may encounter repeated DatabricksS3LoggingException errors while attempting to read a binary file in an Apache Spark structured streaming job. Despite these errors, the job itself does not fail, indicating that the IAM role has all the necessary permissions. The error message typically includes a 403 Forbidden status code, suggesting an access issue with the S3 bucket.

Example

You try to read a binary file.

spark.read.format("binaryFile").load(file_path)
Spark configuration used in the example job:
spark.hadoop.fs.s3a.credentialsType AssumeRole
spark.hadoop.fs.s3a.stsAssumeRole.arn arn:aws:iam::xxxx:role/instance-profile-role
spark.hadoop.fs.s3a.canned.acl BucketOwnerFullControl

After trying to read the binary file, you get a DatabricksS3LoggingUtils error in the driver log4j2 logs.

ERROR DatabricksS3LoggingUtils$:V3: S3 request failed with com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://<s3-bucket>.<s3-region>.amazonaws.com _delta_log {}

Cause

The DatabricksS3LoggingException error is related to the way Spark handles binary file reads in structured streaming jobs. 

When reading binary format files, Spark checks for the presence of delta files under the specified path (file_path). Since the files are not in delta format, Spark does not have the necessary permissions to access the path, resulting in a 403 Forbidden error. This behavior is due to the default configuration of Spark, which expects delta format files and attempts to access the _delta_log directory.

Solution

  1. Disable the delta format check by setting spark.databricks.delta.formatCheck.enabled to false in the compute cluster's Spark config.
  2. Ensure the IAM role used in the Spark job has the necessary permissions to access the S3 bucket.
  3. Verify the policies for the instance profile IAM role and the arn:aws:iam::xxxx:role/instance-profile-role.

For more information, review the Configure S3 access with an instance profile tutorial.