CUDA out of memory error message in GPU clusters

Change the GPU device used by your driver and/or worker nodes.

Written by jessica.santos

Last published at: October 24th, 2024

Problem

When performing model training or fine-tuning a base model using a GPU compute cluster, you encounter the following error (with varying GiB and MiB values) during these processes:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 492.00 MiB (GPU 0; 21.99 GiB total capacity; 20.84 GiB already allocated; 19.00 MiB free; 21.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This error is typically raised by workloads that utilize PyTorch or other libraries that have PyTorch implementations, such as the Transformers library.

Cause

The CUDA memory is running out while trying to allocate additional memory for the model. The error arises when there is not enough free memory available.

Info

GPU memory is separate from the memory used by the worker and driver nodes of the cluster. GPU memory is specific to the GPU device being used for computations.

 

You can check GPU utilization by navigating to the Metrics tab of the cluster you use to run a notebook. From there, you can filter the results by selecting GPU from the dropdown button in the top-right corner of the page.

Solution

Select a suitable GPU device for your intended task, whether it's model training, fine-tuning, or inference. After determining which GPU device is best suited for your workload, navigate to Compute, select an existing cluster or create a new one, then select a Driver/Worker node type that utilizes the chosen GPU device. Once you've made this selection, you can resume working with your model.

Info

Each cloud provider decides which instance types are available in each region. Review the cloud provider documentation to determine if a specific GPU is available in the region (AWSAzureGCP) you are using.

 

Model training

Research the GPU devices available in compute instances for your cloud provider. For example, to address the problem stated in the error message, if your current cluster instance contains T4 GPU devices, consider switching to A10 or V100 devices, which offer larger memory capacities. Then, rerun your process.

Fine-tuning or inference

Check the model's repository on GitHub or its page on Hugging Face to see if specific GPU devices are recommended for specific tasks with that model. For example, Databricks' Dolly LLM Github repository specifies particular GPU instances to get started with response generation and training.