Apache Spark JDBC datasource query
option doesn’t work for Oracle database
Problem
When you use the query
option with the Apache Spark JDBC datasource to connect to Oracle Database, it fails with this error:
java.sql.SQLSyntaxErrorException: ORA-00911: invalid character
For example, if you run the following to make a JDBC connection:
val df = spark.read
.format("jdbc")
.option("url", "<url>")
.option("query", "SELECT * FROM oracle_test_table)")
.option("user", "<user>")
.option("password", "<password>")
.option("driver", "oracle.jdbc.driver.OracleDriver")
.load()
df.show()
You will see this error Message:
java.sql.SQLSyntaxErrorException: ORA-00911: invalid character
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)
Cause
The error is due to a Spark-generated subquery alias (generated with the query
option) that does not conform to Oracle Database identifier naming conventions. This bug is tracked in Spark Jira ticket SPARK-27596 .
Solution
This issue is fixed in Apache Spark 2.4.4 and Databricks Runtime 5.4.
For clusters running on earlier versions of Spark or Databricks Runtime, use the dbtable
option instead of the query
option. The query must be enclosed in parentheses as a subquery.
val df = spark.read
.format("jdbc")
.option("url", "<url>")
.option("dbtable", "(SELECT * FROM oracle_test_table)")
.option("user", "<user>")
.option("password", "<password>")
.option("driver", "oracle.jdbc.driver.OracleDriver")
.load()
df.show()
You can try the same workaround for other databases when the query
option fails.