Updated May 16th, 2022 by harikrishnan.kunhumveettil

display() does not show microseconds correctly

Problem You want to display a timestamp value with microsecond precision, but when you use display() it does not show the value past milliseconds. For example, this Apache Spark SQL display() command: %sql display(spark.sql("select cast('2021-08-10T09:08:56.740436' as timestamp) as test")) Returns a truncated value: 2021-08-10T09:08:56.740+0000 Caus...

0 min reading time
Updated May 10th, 2022 by harikrishnan.kunhumveettil

Job fails, but Apache Spark tasks finish

Problem Your Databricks job reports a failed status, but all Spark jobs and tasks have successfully completed. Cause You have explicitly called spark.stop() or System.exit(0) in your code. If either of these are called, the Spark context is stopped, but the graceful shutdown and handshake with the Databricks job service does not happen. Solution Do ...

0 min reading time
Updated May 24th, 2022 by harikrishnan.kunhumveettil

JDBC write fails with a PrimaryKeyViolation error

Problem You are using JDBC to write to a SQL table that has primary key constraints, and the job fails with a PrimaryKeyViolation error. Alternatively, you are using JDBC to write to a SQL table that does not have primary key constraints, and you see duplicate entries in recently written tables. Cause When Apache Spark performs a JDBC write, one par...

0 min reading time
Updated December 8th, 2022 by harikrishnan.kunhumveettil

Custom garbage collection prevents cluster launch

Problem You are trying to use a custom Apache Spark garbage collection algorithm (other than the default one (parallel garbage collection) on clusters running Databricks Runtime 10.0 and above. When you try to start a cluster, it fails to start. If the configuration is set on an executor, the executor is immediately terminated. For example, if you s...

0 min reading time
Load More