There are some common issues that occur when using notebooks. This section outlines some of the frequently asked questions and best practices that you should follow.
Spark job fails with java.lang.NoClassDefFoundError
Sometimes you may come across an error like:
%scala java.lang.NoClassDefFoundError: Could not initialize class line.....$read$
This can occur with a Spark Scala 2.11 cluster and a Scala notebook, if you mix together a case class definition and Dataset/DataFrame operations in the same notebook cell, and later use the case class in a Spark job in a different cell. For example, in the first cell, say you define a case class MyClass and also created a Dataset.
%scala case class MyClass(value: Int) val dataset = spark.createDataset(Seq(1))
Then in a later cell, you create instances of MyClass inside a Spark job.
%scala dataset.map { i => MyClass(i) }.count()
Solution
Move the case class definition to a cell of its own.
%scala case class MyClass(value: Int) // no other code in this cell
%scala val dataset = spark.createDataset(Seq(1)) dataset.map { i => MyClass(i) }.count()
Spark job fails with java.lang.UnsupportedOperationException
Sometimes you may come across an error like:
java.lang.UnsupportedOperationException: Accumulator must be registered before send to executor
This can occur with a Spark Scala 2.10 cluster and a Scala notebook. The reason and solution for this error are same as the prior Spark job fails with java.lang.NoClassDefFoundError.