Problem
You are migrating jobs from unsupported clusters running Databricks Runtime 6.6 and below with Apache Spark 2.4.5 and below to clusters running a current version of the Databricks Runtime.
If your jobs and/or notebooks process date conversions, they may fail with a SparkUpgradeException error message after running them on upgraded clusters.
Error in SQL statement: SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'YYYY-MM-DD' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Cause
Prior to Spark 3.0, Spark used a combination of the Julian and Gregorian calendars. For dates before 1582, Spark used the Julian calendar. For dates after 1582, Spark used the Gregorian calendar.
In Spark 3.0 and above, Spark uses the Proleptic Gregorian calendar. This calendar is also used by other systems such as Apache Arrow, Pandas, and R.
The calendar usage is inherited from the legacy java.sql.Date API, which was superseded in Java 8 by java.time.LocalDate and uses the Proleptic Gregorian calendar.
Solution
You should update your DateTime references so they are compatible with Spark 3.0 and above.
For example, if you try to parse a date in the format YYYY-MM-DD, it returns an error in Spark 3.0 and above.
select TO_DATE('2017-01-01', 'YYYY-MM-DD') as date
Using the format yyyy-MM-dd works correctly in Spark 3.0 and above.
select TO_DATE('2017-01-01', 'yyyy-MM-dd') as date
The difference in capitalization may appear minor, but to Spark, D references the day-of-year, while d references the day-of-month when used in a DateTime function.
Review all of the defined Spark DateTime patterns for formatting and parsing for more details.