Intermittent NullPointerException when AQE is enabled

Problem

You get an intermittent NullPointerException error when saving your data.

Py4JJavaError: An error occurred while calling o2892.save.
: java.lang.NullPointerException
    at org.apache.spark.sql.execution.adaptive.OptimizeSkewedJoin.$anonfun$getMapSizesForReduceId$1(OptimizeSkewedJoin.scala:167)
    at org.apache.spark.sql.execution.adaptive.OptimizeSkewedJoin.$anonfun$getMapSizesForReduceId$1$adapted(OptimizeSkewedJoin.scala:167)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    ....
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)

Cause

This error can occur if adaptive query execution (AQE) is enabled and you are joining data. If AQE is enabled, skew join is also enabled.

If any of the shuffle data fails due to a cluster scaling down event it generates a NullPointerException error.

Solution

Set spark.sql.adaptive.skewJoin.enabled to false in your Spark configuration.