Job fails with invalid access token
Problem Long running jobs, such as streaming jobs, fail after 48 hours when using dbutils.secrets.get() (AWS | Azure | GCP). For example: %python streamingInputDF1 = ( spark .readStream .format("delta") .table("default.delta_sorce") ) def writeIntodelta(batchDF, batchId): table_name = dbutil...
0 min reading timeRecreate LISTAGG functionality with Spark SQL
LISTAGG is a function that aggregates a set of string elements into one string by concatenating the strings. An optional separator string can be provided which is inserted between contiguous input strings. LISTAGG(<expression>, <separator>) WITHIN GROUP(ORDER BY …) LISTAGG is supported in many databases and data warehouses. However, it i...
1 min reading timeError when downloading full results after join
Problem You are working with two tables in a notebook. You perform a join. You can preview the output, but when you try to Download full results you get an error. Error in SQL statement: AnalysisException: Found duplicate column(s) when inserting into dbfs:/databricks-results/ Reproduce error Create two tables.%python from pyspark.sql.functions impo...
0 min reading timeInconsistent timestamp results with JDBC applications
Problem When using JDBC applications with Databricks clusters you see inconsistent java.sql.Timestamp results when switching between standard time and daylight saving time. Cause Databricks clusters use UTC by default. java.sql.Timestamp uses the JVM’s local time zone. If a Databricks cluster returns 2021-07-12 21:43:08 as a string, the JVM parses i...
0 min reading time