Secret key rate drops after migrating to GCE compute cluster

Have your GCP administrator increase the minimum ports per VM instance.

Written by daniel.ruiz

Last published at: August 10th, 2025

Problem

When you use a GKE compute cluster, you can obtain between 50 and 100 secret keys per minute using the API without any issue. After you migrate to a GCE compute cluster, you notice a drop in the number of keys generated per minute.

 

Cause

There are two reasons for this behavior. First, GCP NAT works differently for GKE VMs. For GKE VMs, NAT will assign a minimum of 1024 ports per VM. 

 

For GCE VMs, the 1024 port minimum is not enforced. Instead, whichever ports-per-VM value you set on the NAT is used. You can review details in the “GKE interactions” section of Google’s Cloud NAT product interactions documentation.

 

Second, you may be exhausting the 64 ports on the VM quickly by launching close to 64 connections within a two-minute window.

 

For context, when secret fetch is used, it requires a new TCP connection to the control plane. The TCP connection uses a port from the VM’s allocated NAT ports. Even when the connection is closed, there is a TCP wait of 120 seconds, which holds the port for two minutes and then releases it.

 

Solution

For a GCE compute cluster, increase the minimum ports per VM instance to a higher value such as 1024. 

 

Your GCP administrator should execute this activity since it is not a configuration managed by Databricks. Instructions can be found in the “Impact of tuning NAT configurations on existing NAT connections” section of Google’s Tune NAT configuration documentation.