Databricks jobs fail, due to a lack of space on the disk, even though storage auto-scaling is enabled.
When you review the cluster event log, you see a message stating that the instance
failed to expand disk due to an authorization error.
Instance i-xxxxxxxxx failed to expand disk because: You are not authorized to perform this operation. Encoded authorization failure message: xxxxx(Service: AmazonEC2; Status Code: 403; Error Code: UnauthorizedOperation; Request ID: xxxxxx). Current size: 98.30 GB Current free space: 8.89 GB.
The underlying AWS account for the workspace does not have the correct permissions to enable Databricks to attach an EBS volume to the instance and then remove it after the cluster is terminated.
Cluster creation uses a different set of permissions for EBS volumes. As a result, cluster creation succeeds, but attempts to expand the existing volume fails.
You must add the following permissions to the Databricks workspace deployment IAM role.
ec2:AttachVolume ec2:CreateVolume ec2:DeleteVolume ec2:DescribeVolumes
You can find the Databricks workspace deployment IAM role by logging in to the Account Console and navigating to the AWS account tab.