The Executors tab in the Spark UI shows less memory than is actually available on the node:
An m4.xlarge instance (16 GB ram, 4 core) for the driver node, shows
4.5 GBmemory on the Executors tab.
An m4.large instance (8 GB ram, 2 core) for the driver node, shows
710 GBmemory on the Executors tab:
The total amount of memory shown is less than the memory on the cluster because some memory is occupied by kernel and node-level services.
To calculate the available amount of memory, you can use the formula used for executor memory allocation
(all_memory_size * 0.97 - 4800MB) * 0.8, where:
- 0.97 accounts for kernel overhead.
- 4800 MB accounts for internal node-level services (node daemon, log daemon, and so on).
- 0.8 is a heuristic to ensure the LXC container running the Spark process doesn’t crash due to out-of-memory errors.
Total available memory for storage on an m4.large instance is
(8192MB * 0.97 - 4800MB) * 0.8 - 1024 = 1.2 GB. Because the parameter
spark.memory.fraction is by default 0.6, approximately
(1.2 * 0.6) = ~710 MB is available for storage.
You can change the
spark.memory.fraction Spark configuration to adjust this parameter. Calculate the available memory for a new parameter as follows:
- If you use an m4.large instance, which has 8192 MB memory, it has available memory 1.2 GB.
- If you specify a
spark.memory.fractionof 0.8, the Executors tab in the Spark UI should show:
(1.2 * 0.8)GB = ~960 MB.