Updated May 11th, 2022 by ashish

Apache Spark UI shows wrong number of jobs

Problem You are reviewing the number of active Apache Spark jobs on a cluster in the Spark UI, but the number is too high to be accurate. If you restart the cluster, the number of jobs shown in the Spark UI is correct at first, but over time it grows abnormally high. Cause The Spark UI is not always accurate for large, or long-running, clusters due ...

0 min reading time
Updated May 19th, 2022 by ashish

Job remains idle before starting

Problem You have an Apache Spark job that is triggered correctly, but remains idle for a long time before starting. You have a Spark job that ran well for awhile, but goes idle for a long time before resuming. Symptoms include: Cluster downscales to the minimum number of worker nodes during idle time. Driver logs don’t show any Spark jobs during idl...

0 min reading time
Updated May 19th, 2022 by ashish

Conflicting directory structures error

Problem You have an Apache Spark job that is failing with a Java assertion error java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Example stack trace Caused by: org.apache.spark.sql.streaming.StreamingQueryException: There was an error when trying to infer the partition schema of the current batch of files. Plea...

1 min reading time
Updated May 19th, 2022 by ashish

Streaming job using Kinesis connector fails

Problem You have a streaming job writing to a Kinesis sink, and it is failing with out of memory error messages. java.lang.OutOfMemoryError: GC Overhead limit exceeded java.lang.OutOfMemoryError: Java heap space. Symptoms include: Ganglia shows a gradual increase in JVM memory usage. Microbatch analysis shows input and processing rates are consisten...

0 min reading time
Updated May 11th, 2022 by ashish

Streaming job has degraded performance

Problem You have a streaming job which has its performance degrade over time. You start a new streaming job with the same configuration and same source, and it performs better than the existing job. Cause Issues with old checkpoints can result in performance degradation in long running streaming jobs. This can happen if the job was intermittently ha...

0 min reading time
Load More