CUDA OutOfMemoryError tried to allocate MiB while performing model training on the GPU compute

Use higher GPU nodes or reduce precision from 32 bit to 16 bit.

Last published at: June 10th, 2025

Problem

While performing model training on a GPU-enabled compute, you receive an error message similar to the following example.

warnings.warn(
OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU
File <command-xxxxxx>, line 88
85 return outputs[0]["generated_text"]
87 # Instantiate the Hugging Face–based LLMs
---> 88 response_llm = MistralLLM()

Cause

GPU memory utilization is high.

By default, the model loads in 32-bit precision. Loading models with higher parameters (for example, greater than seven billion) requires higher-capacity GPU memory than the default.

Solution

Update to a higher-capacity GPU node.

Alternatively, if you do not want to use a higher-capacity GPU node, modify your model’s from_pretrained() call to reduce the precision from 32 bit to 16 bit and automatically distribute model layers across available devices (GPU or CPU) without manual checks.

Make sure to comment out the device=0 if torch.cuda.is_available() else -1 line of code, which is a manual check for GPU availability. It conflicts with device_map="auto".

from transformers import AutoModelForCausalLM
import torch

#Comment out this block:
#device=0 if torch.cuda.is_available() else -1 #This is where we check if to use GPU if available 

model = AutoModelForCausalLM.from_pretrained(
    "<your-model-name>",
    torch_dtype=torch.float16,#This will Force FP16 reducing precision from 32 bit to 16 bit.
     device_map="auto" #Automatically place layers on GPU/CPU
 )

Databricks Help Center

Problem

Cause

Solution

Contact Us