Different results when using rlike with regex in SQL queries vs Spark SQL queries

You must properly escape the backslash character in rlike patterns.

Written by wanderson.oliveira

Last published at: April 26th, 2025

Problem

Users may encounter inconsistent results when running the same query on Spark SQL and SQL, particularly when using the rlike operator (AWSAzureGCP) with any regular expression pattern.

 

Cause 

This issue occurs because Spark SQL and SQL treat backslash characters in regular expression patterns differently. When passing a query programmatically (for example, in Scala or Python), the backslash character needs extra escaping in order to be interpreted correctly.

 

Solution

To resolve this issue, make sure you properly escape the backslash character in your rlike patterns, especially when passing queries programmatically. The exact escaping required will depend on the environment or language you are using to run the query. Each backslash needs proper escaping to have the correct result.

 

Example SQL

% python

column_state rlike "(PLATINUM|BRONZE)(\\d?)AVAIL"

 

Example Spark.sql

% python

column_state rlike '(PLATINUM|BRONZE)(\\\\d?)AVAIL'