GPU metrics indicate that the GPU is not being used during model inference

Ensure that you are sending the model to the GPU in your code.

Written by jessica.santos

Last published at: April 26th, 2025

Problem

You have a PyTorch model that you have already logged and registered in your workspace using MLflow. When loading it using the mlflow.pytorch.load_model() function and passing inputs to perform predictions in your Databricks notebook, you notice after some time that the Cluster Metrics page shows 0% as GPU utilization for the cluster attached to the notebook.

 

Cause 

You haven't loaded the model specifying the available GPU device when calling the mlflow.pytorch.load_model() function.

 

Solution

The device parameter was added to the mlflow.pytorch.load_model() function on Dec 27, 2023 to allow the model to be sent to the defined device when loading it. You can solve the issue by following this code snippet example.

# Get cpu or gpu for inference.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Define the Model URI
logged_model = f"runs:/{<experiment_run_id>}/model"
loaded_model = mlflow.pytorch.load_model(model_uri=logged_model, device=device)