Updated February 24th, 2023 by manjunath.swamy

Recreate LISTAGG functionality with Spark SQL

LISTAGG is a function that aggregates a set of string elements into one string by concatenating the strings. An optional separator string can be provided which is inserted between contiguous input strings. LISTAGG(<expression>, <separator>) WITHIN GROUP(ORDER BY …) LISTAGG is supported in many databases and data warehouses. However, it i...

1 min reading time
Updated May 23rd, 2022 by manjunath.swamy

Error when downloading full results after join

Problem You are working with two tables in a notebook. You perform a join. You can preview the output, but when you try to Download full results you get an error. Error in SQL statement: AnalysisException: Found duplicate column(s) when inserting into dbfs:/databricks-results/ Reproduce error Create two tables.%python from pyspark.sql.functions impo...

0 min reading time
Updated June 1st, 2022 by manjunath.swamy

Inconsistent timestamp results with JDBC applications

Problem When using JDBC applications with Databricks clusters you see inconsistent java.sql.Timestamp results when switching between standard time and daylight saving time. Cause Databricks clusters use UTC by default. java.sql.Timestamp uses the JVM’s local time zone. If a Databricks cluster returns 2021-07-12 21:43:08 as a string, the JVM parses i...

0 min reading time
Updated May 11th, 2022 by manjunath.swamy

Job fails with invalid access token

Problem Long running jobs, such as streaming jobs, fail after 48 hours when using dbutils.secrets.get() (AWS | Azure | GCP). For example: %python streamingInputDF1 = (      spark     .readStream                            .format("delta")                    .table("default.delta_sorce")   ) def writeIntodelta(batchDF, batchId):   table_name = dbutil...

0 min reading time
Load More