NullPointerException in Apache Spark when using sc.parallelize().foreach() with write operations

Iterate over table names sequentially using a standard foreach loop instead.

Written by saritha.shivakumar

Last published at: January 20th, 2026

Problem

You’re using Scala in a notebook to parallelize write operations across multiple tables. The following code demonstrates an example. 

%scala 

def f2(x: String): Unit = {
  val data = Array(<your-array>)
  val dataRDD = sc.parallelize(data)
  val dataDF = dataRDD.toDF()
  val tempFile = "<your-base-path>".replaceAll("destinationTableName", x)
  dataDF.write.mode("overWrite").parquet(tempFile)
  println(x)
}

val tables = Array(<your-table-names>)
sc.parallelize(tables).foreach(f2)

 

When you run the cell, you encounter the following error in the output field.

Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in stage 3.0 (TID 10) (10.101.174.123 executor 0): java.lang.NullPointerException: Cannot invoke "org.apache.spark.SparkContext.parallelize$default$2()" because the return value of "$line1b8899e3320d4fb98e335d30f4db19a03.$read$$iw$$iw.sc()" is null

 

Cause

Using sc.parallelize on an array of table names and performing write operations in each parallelized task inadvertently triggers initialization of multiple, simultaneous SparkContext or SparkSession instances.

 

In the example in the previous section, parallel execution using sc.parallelize(tables).foreach(f2) causes multiple Apache Spark tasks to attempt writing concurrently, leading to the NullPointerException.

 

Solution

Iterate over table names sequentially using a standard foreach loop instead of parallelizing operations using sc.parallelize.

 

Using a sequential foreach ensures tasks execute one after another, preventing SparkContext conflicts and avoiding stage failure.

%scala
def f2(x: String): Unit = {
  val data = Array(<your-array>)
  val dataRDD = sc.parallelize(data)
  val dataDF = dataRDD.toDF()
  val tempFile = "<your-base-path>".replaceAll("destinationTableName", x)
  dataDF.write.mode("overWrite").parquet(tempFile)
  println(x)
}

val tables = Array(<your-table-names>)
tables.foreach(f2)