Apache Spark job fails with Parquet column cannot be converted error
Problem You are reading data in Parquet format and writing to a Delta table when you get a Parquet column cannot be converted error message. The cluster is running Databricks Runtime 7.3 LTS or above. org.apache.spark.SparkException: Task failed while writing rows. Caused by: com.databricks.sql.io.FileReadException: Error while reading file s3://buc...
0 min reading timeH2O.ai Sparkling Water cluster not reachable
Problem You are trying to initialize H2O.ai’s Sparkling Water on Databricks Runtime 7.0 and above when you get a H2OClusterNotReachableException error message. %python import ai.h2o.sparkling._ val h2oContext = H2OContext.getOrCreate() ai.h2o.sparkling.backend.exceptions.H2OClusterNotReachableException: H2O cluster X.X.X.X:54321 - sparkling-water-ro...
0 min reading timeDownload artifacts from MLflow
By default, the MLflow client saves artifacts to an artifact store URI during an experiment. The artifact store URI is similar to /dbfs/databricks/mlflow-tracking/<experiment-id>/<run-id>/artifacts/. This artifact store is a MLflow managed location, so you cannot download artifacts directly. You must use client.download_artifacts in the ...
0 min reading timefrom_json returns null in Apache Spark 3.0
Problem The from_json function is used to parse a JSON string and return a struct of values. For example, if you have the JSON string [{"id":"001","name":"peter"}], you can pass it to from_json with a schema and get parsed struct values in return. %python from pyspark.sql.functions import col, from_json display( df.select(col('value'), from_json(c...
0 min reading time