Updated July 22nd, 2022 by chetan.kardekar

Parsing post meridiem time (PM) with to_timestamp() returns null

Problem You are trying to parse a 12-hour (AM/PM) time value with to_timestamp(), but instead of returning a 24-hour time value it returns null. For example, this sample code: %sql SELECT to_timestamp('2016-12-31 10:12:00 PM', 'yyyy-MM-dd HH:mm:ss a'); Returns null when run: Cause to_timestamp() requires the hour format to be in lowercase. If the ho...

0 min reading time
Updated May 16th, 2022 by chetan.kardekar

Hyperopt fails with maxNumConcurrentTasks error

Problem You are tuning machine learning parameters using Hyperopt when your job fails with a py4j.Py4JException: Method maxNumConcurrentTasks([]) does not exist error. You are using a Databricks Runtime for Machine Learning (Databricks Runtime ML) cluster. Cause Databricks Runtime ML has a compatible version of Hyperopt pre-installed (AWS | Azure | ...

0 min reading time
Updated May 10th, 2022 by chetan.kardekar

Identify duplicate data on append operations

A common issue when performing append operations on Delta tables is duplicate data. For example, assume user 1 performs a write operation on Delta table A. At the same time, user 2 performs an append operation on Delta table A. This can lead to duplicate records in the table. In this article, we review basic troubleshooting steps that you can use to...

1 min reading time
Updated July 8th, 2022 by chetan.kardekar

Apache Spark UI is not in sync with job

Problem The status of your Spark jobs is not correctly shown in the Spark UI (AWS | Azure | GCP). Some of the jobs that are confirmed to be in the Completed state are shown as Active/Running in the Spark UI. In some cases the Spark UI may appear blank. When you review the driver logs, you see an AsyncEventQueue warning. Logs ===== 20/12/23 21:20:26 ...

1 min reading time
Load More