Problem
When you use the query option with the Apache Spark JDBC datasource to connect to an Oracle Database, it fails with this error:
java.sql.SQLSyntaxErrorException: ORA-00911: invalid character
For example, if you run the following to make a JDBC connection:
%scala val df = spark.read .format("jdbc") .option("url", "<url>") .option("query", "SELECT * FROM oracle_test_table)") .option("user", "<user>") .option("password", "<password>") .option("driver", "oracle.jdbc.driver.OracleDriver") .load() df.show()
You will see this error Message:
java.sql.SQLSyntaxErrorException: ORA-00911: invalid character at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)
Cause
The error is due to a Spark-generated subquery alias (generated with the query option) that does not conform to Oracle Database identifier naming conventions. This bug is tracked in Spark Jira ticket SPARK-27596.
Solution
This issue is fixed in Apache Spark 2.4.4 and Databricks Runtime 5.4.
For clusters running on earlier versions of Spark or Databricks Runtime, use the dbtable option instead of the query option. The query must be enclosed in parentheses as a subquery.
%scala val df = spark.read .format("jdbc") .option("url", "<url>") .option("dbtable", "(SELECT * FROM oracle_test_table)") .option("user", "<user>") .option("password", "<password>") .option("driver", "oracle.jdbc.driver.OracleDriver") .load() df.show()
You can try the same workaround for other databases when the query option fails.