Updated June 1st, 2022 by vikas.yadav

Kafka client terminated with OffsetOutOfRangeException

Problem You have an Apache Spark application that is trying to fetch messages from an Apache Kafka source when it is terminated with a kafkashaded.org.apache.kafka.clients.consumer.OffsetOutOfRangeException error message. Cause Your Spark application is trying to fetch expired data offsets from Kafka. We generally see this in these two scenarios: Sc...

1 min reading time
Updated May 23rd, 2022 by vikas.yadav

Duplicate columns in the metadata error

Problem Your Apache Spark job is processing a Delta table when the job fails with an error message. org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the metadata update: col1, col2... Cause There are duplicate column names in the Delta table. Column names that differ only by case are considered duplicate. Delta Lake is case prese...

0 min reading time
Updated May 16th, 2022 by vikas.yadav

MLflow project fails to access an Apache Hive table

Problem You have an MLflow project that fails to access a Hive table and returns a Table or view not found error. pyspark.sql.utils.AnalysisException: "Table or view not found: `default`.`tab1`; line 1 pos 21;\n'Aggregate [unresolvedalias(count(1), None)]\n+- 'UnresolvedRelation `default`.`tab1`\n" xxxxx ERROR mlflow.cli: === Run (ID 'xxxxx') failed...

0 min reading time
Load More