Problem:
You are trying to create a GPU ML cluster when you get a VcpuLimitExceeded error message. The error message says the requested vCPU capacity exceeds the current vCPU limit of 0 for the instance bucket that the specified instance type belongs to.
Solution:
- Check with your AWS team and AWS support team to understand why the VcpuLimitExceeded error is occurring.
- Visit http://aws.amazon.com/contact-us/ec2-request and request an adjustment to the vCPU limit.
- Increase the capacity in your AWS account to accommodate the requested vCPU capacity.
- Retry creating the GPU ML cluster once the vCPU limit has been adjusted and the capacity has been increased.