Failed to expand the EBS volume

Problem

Databricks jobs fail, due to a lack of space on the disk, even though storage auto-scaling is enabled.

When you review the cluster event log, you see a message stating that the instance failed to expand disk due to an authorization error.

Instance i-xxxxxxxxx failed to expand disk because: You are not authorized to perform this operation. Encoded authorization failure message: xxxxx(Service: AmazonEC2; Status Code: 403; Error Code: UnauthorizedOperation; Request ID: xxxxxx). Current size: 98.30 GB Current free space: 8.89 GB.

Cause

The underlying AWS account for the workspace does not have the correct permissions to enable Databricks to attach an EBS volume to the instance and then remove it after the cluster is terminated.

Note

Cluster creation uses a different set of permissions for EBS volumes. As a result, cluster creation succeeds, but attempts to expand the existing volume fails.

Solution

You must add the following permissions to the Databricks workspace deployment IAM role.

ec2:AttachVolume
ec2:CreateVolume
ec2:DeleteVolume
ec2:DescribeVolumes

You can find the Databricks workspace deployment IAM role by logging in to the Account Console and navigating to the AWS account tab.