Problem
While attempting to add another availability zone subnet to a Databricks network in AWS, users may encounter an error when updating the workspace network configuration in the admin settings.
error_code: INVALID_STATE_TRANSITION, message: INVALID_STATE: Please terminate all pool & cluster EC2 instances in your workspace subnets before attempting to update the workspace.
Cause
You have active compute resources, including clusters and pools, operating in the workspace. These result in backend EC2 instances remaining active, which obstructs the transition to the new network state required to update the configuration.
For more information, refer to the Update workspace configuration API documentation.
Solution
- Create a new network configuration.
- Construct a network configuration that includes all subnets (current and new subnets) as well as necessary network details, such as security groups and VPC information.
- Schedule a time to stop all active compute resources, preferably during a maintenance window to avoid disruption.
- Stop or pause all active compute resources.
- Stop running workloads and clusters in the Databricks workspace.
- Pause any scheduled jobs and pipelines.
- Also ensure there are no active EC2 instances in the subnets associated with the workspace.
- Update your workspace by selecting the newly created configuration.
If the update in step four fails with the same error, there are still active EC2 instances. To find and deactivate them:
- Log in to the AWS Management Console and navigate to the EC2 section.
- Filter instances using the workspace ID as a tag to identify active instances tied to the workspace.
- Review the tags of the instances to locate details like associated cluster names.
- Terminate the clusters connected to these instances using Databricks' cluster management tools.
After confirming that no EC2 instances are active in the workspace subnets, retry the workspace update.