Problem
Users may encounter inconsistent results when running the same query on Spark SQL and SQL, particularly when using the rlike operator (AWS | Azure | GCP) with any regular expression pattern.
Cause
This issue occurs because Spark SQL and SQL treat backslash characters in regular expression patterns differently. When passing a query programmatically (for example, in Scala or Python), the backslash character needs extra escaping in order to be interpreted correctly.
Solution
To resolve this issue, make sure you properly escape the backslash character in your rlike
patterns, especially when passing queries programmatically. The exact escaping required will depend on the environment or language you are using to run the query. Each backslash needs proper escaping to have the correct result.
Example SQL
% python
column_state rlike "(PLATINUM|BRONZE)(\\d?)AVAIL"
Example Spark.sql
% python
column_state rlike '(PLATINUM|BRONZE)(\\\\d?)AVAIL'