Access files written by Apache Spark on ADLS Gen1

Configure permissions to allow access to files that Apache Spark writes to ADLS Gen1 storage.

Last published at: December 9th, 2022

Problem

You are using Azure Databricks and have a Spark job that is writing to ADLS Gen1 storage.

When you try to manually read, write, or delete data in the folders you get an error message.

Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation

Cause

When writing data to ADLS Gen1 storage, Apache Spark uses the service principal as the owner of the files it creates. The service principal is defined in dfs.adls.oauth2.client.id.

When files are created, they inherit the default permissions from the Hadoop filesystem. The Hadoop filesystem has a default permission of 666 (-rw-rw-rw-) and a default umask of 022, which results in the 644 permission setting as the default for files.

When folders are created, they inherit the parent folder permissions, which are 770 by default.

Because the owner is the service principal and not the user, you don’t have permission to access the folder due to the 0 bit in the folder permissions.

Solution

Option 1

Make the service principal user part of the same group as the default user. This will allow access when accessing storage through the portal.

Please reach out to Microsoft support for assistance.

Option 2

Create a base folder in ADLS Gen1 and set the permissions to 777. Write Spark output under this folder. Because folders created by Spark inherit the parent folder permissions, all folders created by Spark will have 777 permissions. This allows any user to access the folders.

Option 3

Change the default umask from 022 to 000 on your Azure Databricks clusters.

Set spark.hadoop.fs.permissions.umask-mode 000 in the Spark config for your cluster.

With a umask of 000, the default Hadoop filesystem permission of 666 becomes the default permission used when Azure Databricks creates objects.

Databricks Help Center