Receiving “no space left on device error” message when attempting to use Apache Spark

Switch to a VM instance with attached local storage or increase the size of the VM boot disk.

Written by raphael.balogo

Last published at: January 30th, 2025

Problem

You are attempting to use Apache Spark but are getting general failures and you cannot identify an obvious reason for the failures.

 

When you enable Compute Log Delivery, or review the failed stage tasks in the Spark UI, you see a No space left on device error message in one (or more) of the executors.

 

dd/mm/yy hh:mm:ss ERROR Executor: Exception in task x.x in stage x.x (TID x)
java.lang.RuntimeException: Error writing to file "/local_disk0/…": No space left on device.

 

Cause

One or more of the executors does not have enough local disk space. 

This can occur if you select a GCP VM instance that does not have a local SSD attached. This can result in situations where there is no disk space for required operations, such as shuffle operations.

 

Solution

Switch the executor instance type to a GCP instance type that includes local storage.

If changing the VM instance type is not an option, you can set gcp_attributes.boot_disk_size through the Databricks REST API for machines without local storage disk. This should help alleviate the problem. 

This is an example of POST request body. Fill in your GCP details before sending.

{
  "cluster_name": <name>,
  "spark_version": <version>,
  "node_type_id": <instance-name>,
  "num_workers": 0,
  "spark_conf": {
    "spark.databricks.cluster.profile": "singleNode",
    "spark.master": "local[*, 4]"
  },
  "custom_tags": {
    "ResourceClass": "SingleNode"
  },
  "gpc_attributes": {
    "boot_disk_size": <size-in-gb>,
  }
}

 

Please note that the amount of disk allocated depends on the workload, the amount of shuffle, and the join conditions. A good reference to start is double of the amount of input data. Databricks recommends a minimum of 100 GB.