Problem
You're trying to execute a SQL query with a column alias using double quotes, such as the following.
SELECT 1 AS "a"
The query results in a syntax error.
[PARSE_SYNTAX_ERROR] Syntax error at or near '"a"'. SQLSTATE: 42601
Cause
By default, Apache Spark does not allow the use of double quotes for identifiers in SQL queries. Spark interprets double quotes as string literals instead of identifiers, resulting in a syntax error.
Solution
Enable the doubleQuotedIdentifiers
setting in your Spark configuration. At the same time, set spark.sql.ansi.enabled
to true
. The doubleQuotedIdentifiers
setting requires ANSI mode to be enabled. For details, refer to the Spark Configuration documentation.
You can set the options at the notebook level. Run the following commands on a new cell on top of your workload.
%python
spark.conf.set("spark.sql.ansi.enabled", "true")
spark.conf.set("spark.sql.doubleQuotedIdentifiers", "true")