Updated May 20th, 2022 by shanmugavel.chandrakasu

Apache Spark job fails with Parquet column cannot be converted error

Problem You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted error message. The cluster is running Databricks Runtime 7.3 LTS or above. org.apache.spark.SparkException: Task failed while writing rows. Caused by: com.databricks.sql.io.FileReadException: Error while reading file s3://buc...

0 min reading time
Updated May 16th, 2022 by shanmugavel.chandrakasu

H2O.ai Sparkling Water cluster not reachable

Problem You are trying to initialize H2O.ai’s Sparkling Water on Databricks Runtime 7.0 and above when you get a H2OClusterNotReachableException error message. %python import ai.h2o.sparkling._ val h2oContext = H2OContext.getOrCreate() ai.h2o.sparkling.backend.exceptions.H2OClusterNotReachableException: H2O cluster X.X.X.X:54321 - sparkling-water-ro...

0 min reading time
Updated May 16th, 2022 by shanmugavel.chandrakasu

Download artifacts from MLflow

By default, the MLflow client saves artifacts to an artifact store URI during an experiment. The artifact store URI is similar to /dbfs/databricks/mlflow-tracking/<experiment-id>/<run-id>/artifacts/. This artifact store is a MLflow managed location, so you cannot download artifacts directly. You must use client.download_artifacts in the ...

0 min reading time
Updated May 23rd, 2022 by shanmugavel.chandrakasu

from_json returns null in Apache Spark 3.0

Problem The from_json function is used to parse a JSON string and return a struct of values. For example, if you have the JSON string [{"id":"001","name":"peter"}], you can pass it to from_json with a schema and get parsed struct values in return. %python from pyspark.sql.functions import col, from_json display(   df.select(col('value'), from_json(c...

0 min reading time
Load More