Apache Spark job fails with DatabricksThrottledException error

Remove Spark settings causing throttling and set a regional STS endpoint.

Written by zhengxian.huang

Last published at: April 18th, 2025

Problem

Your Apache Spark job fails with an error message such as the following.

shaded.databricks.org.apache.hadoop.fs.s3a.DatabricksThrottledException: Instantiate shaded.databricks.org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider on : com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: Rate exceeded (Service: AWSSecurityTokenService; Status Code: 400; Error Code: Throttling; Request ID: XXXXXXXX; Proxy: null)

 

Cause

There is an excessive number of AssumeRole API calls being made from the same instance type within a short timeframe, leading to throttling by AWS STS. 

An excessive number of calls can occur when specific Spark configurations are modified but should not be, such as:

  • spark.hadoop.fs.s3.impl
  • spark.hadoop.fs.s3n.impl
  • spark.hadoop.fs.s3a.impl

 

Additionally, you may have an STS endpoint set at the global level, which can contribute to the issue and is more costly. For example, using sts.amazonaws.com instead of the regional endpoint sts.<region>.amazonaws.com can lead to failures in Spark runtime.

 

Solution

  1. Remove the following Spark configurations from all clusters, including interactive and job clusters.
    • spark.hadoop.fs.s3.impl
    • spark.hadoop.fs.s3n.impl
    • spark.hadoop.fs.s3a.impl
  2. Ensure that the STS endpoint is set correctly at the regional level. Use the regional endpoint format: sts.<region>.amazonaws.com.
  3. Monitor the STS calls to ensure that the changes have resolved the throttling issue.

 

For further information, refer to the Configure a customer-managed VPC documentation.