Multiple Apache Spark JAR jobs fail when run concurrently

Apache Spark JAR jobs failing with an AnalysisException error when run concurrently.

Written by Adam Pavlacka

Last published at: February 28th, 2023

Problem

If you run multiple Apache Spark JAR jobs concurrently, some of the runs might fail with the error:

org.apache.spark.sql.AnalysisException: Table or view not found: xxxxxxx; line 1 pos 48

Cause

This error occurs due to a bug in Scala. When an object extends App, its val fields are no longer immutable and they can be changed when the main method is called. If you run JAR jobs multiple times, a val field containing a DataFrame can be changed inadvertently.

As a result, when any one of the concurrent runs finishes, it wipes out the temporary views of the other runs. Scala issue 11576 provides more detail.

Solution

To work around this bug, call the main() method explicitly. As an example, if you have code similar to this:

%scala

  object MainTest extends App {
    ...
  }

You can replace it with code that does not extend App:

%scala

  object MainTest {
    def main(args: Array[String]) {
    ......
    }
  }