GPU metrics indicate that the GPU is not being used during model inference

Ensure that you are sending the model to the GPU in your code.

Last published at: April 26th, 2025

Problem

You have a PyTorch model that you have already logged and registered in your workspace using MLflow. When loading it using the mlflow.pytorch.load_model() function and passing inputs to perform predictions in your Databricks notebook, you notice after some time that the Cluster Metrics page shows 0% as GPU utilization for the cluster attached to the notebook.

Cause

You haven't loaded the model specifying the available GPU device when calling the mlflow.pytorch.load_model() function.

Solution

The device parameter was added to the mlflow.pytorch.load_model() function on Dec 27, 2023 to allow the model to be sent to the defined device when loading it. You can solve the issue by following this code snippet example.

# Get cpu or gpu for inference.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Define the Model URI
logged_model = f"runs:/{<experiment_run_id>}/model"
loaded_model = mlflow.pytorch.load_model(model_uri=logged_model, device=device)

Databricks Help Center

Problem

Cause

Solution

Contact Us