Inconsistent job performance with shuffle-intensive workloads

Use an instance series with locally-attached NVMe SSD.

Written by joel.robin

Last published at: June 4th, 2025

Problem

You are running a job on a general-purpose, compute-optimized, or memory-optimized instance series such as M5, M6i, C5, C6i, R5, R6i. 

 

The job involves heavy shuffle read/write operations due to transformations such as groupByKey, joindistinctrepartitionreduceByKey, and sortBy. The job consequently exhibits inconsistent runtimes even with a consistent input data size.

 

Cause

The previously-listed instance series do not have locally-attached SSDs, instead relying on network-attached EBS volumes for shuffle storage. 

 

The reliance on network I/O introduces additional latency and contention, especially during intensive shuffle operations, leading to non-deterministic execution times influenced by network bandwidth availability.

 

Solution

Use an instance series such as: 

  • M5d
  • M6id
  • C5d
  • C6id
  • R5d 
  • R6id

Any of these are equipped with locally-attached NVMe (Non-Volatile Memory Express) SSDs. 

These instances series provide high-throughput, low-latency local storage, significantly improving shuffle performance and ensuring more stable and predictable job runtimes for shuffle-intensive workloads.