Updated May 16th, 2022 by Gobinath.Viswanathan

Autoscaling is slow with an external metastore

Problem You have an external metastore configured on your cluster and autoscaling is enabled, but the cluster is not autoscaling effectively. Cause You are copying the metastore jars to every executor, when they are only needed in the driver. It takes time to initialize and run the jars every time a new executor spins up. As a result, adding more ex...

1 min reading time
Updated March 4th, 2022 by Gobinath.Viswanathan

Cluster Apache Spark configuration not applied

Problem Your cluster’s Spark configuration values are not applied. Cause This happens when the Spark config values are declared in the cluster configuration as well as in an init script. When Spark config values are located in more than one place, the configuration in the init script takes precedence and the cluster ignores the configuration setting...

0 min reading time
Updated April 11th, 2023 by Gobinath.Viswanathan

FileReadException on DBFS mounted filesystem

Problem Your Apache Spark jobs are failing with a FileReadException error when attempting to read files on DBFS (Databricks File System) mounted paths. org.apache.spark.SparkException: Job aborted due to stage failure: Task x in stage y failed n times, most recent failure: Lost task 0.3 in stage 141.0 (TID 770) (x.y.z.z executor 0): com.databricks.s...

0 min reading time
Updated May 30th, 2023 by Gobinath.Viswanathan

Access S3 with temporary session credentials

You can use IAM session tokens with Hadoop config support to access S3 storage in Databricks Runtime 8.3 and above. Info You cannot mount the S3 path as a DBFS mount when using session credentials. You must use the S3A URI. Extract the session credentials from your cluster Extract the session credentials from your cluster. You will need the Instance...

1 min reading time
Updated March 4th, 2022 by Gobinath.Viswanathan

IP access list update returns INVALID_STATE

Problem You are trying to update an IP access list and you get an INVALID_STATE error message. {"error_code":"INVALID_STATE","message":"Your current IP 3.3.3.3 will not be allowed to access the workspace under current configuration"} Cause The IP access list update that you are trying to commit does not include your current public IP address. If you...

0 min reading time
Load More