JDBC write fails with a PrimaryKeyViolation error
Problem You are using JDBC to write to a SQL table that has primary key constraints, and the job fails with a PrimaryKeyViolation error. Alternatively, you are using JDBC to write to a SQL table that does not have primary key constraints, and you see duplicate entries in recently written tables. Cause When Apache Spark performs a JDBC write, one par...
0 min reading timedisplay() does not show microseconds correctly
Problem You want to display a timestamp value with microsecond precision, but when you use display() it does not show the value past milliseconds. For example, this Apache Spark SQL display() command: %sql display(spark.sql("select cast('2021-08-10T09:08:56.740436' as timestamp) as test")) Returns a truncated value: 2021-08-10T09:08:56.740+0000 Caus...
0 min reading timeCustom garbage collection prevents cluster launch
Problem You are trying to use a custom Apache Spark garbage collection algorithm (other than the default one (parallel garbage collection) on clusters running Databricks Runtime 10.0 and above. When you try to start a cluster, it fails to start. If the configuration is set on an executor, the executor is immediately terminated. For example, if you s...
0 min reading timeJob fails, but Apache Spark tasks finish
Problem Your Databricks job reports a failed status, but all Spark jobs and tasks have successfully completed. Cause You have explicitly called spark.stop() or System.exit(0) in your code. If either of these are called, the Spark context is stopped, but the graceful shutdown and handshake with the Databricks job service does not happen. Solution Do ...
0 min reading time