Problem: Apache Spark JDBC Datasource query Option Doesn’t Work For Oracle Database

Problem

When you use the query option with the Apache Spark JDBC datasource to connect to Oracle Database, it fails with this error:

java.sql.SQLSyntaxErrorException: ORA-00911: invalid character

For example, if you run the following to make a JDBC connection:

val df = spark.read
  .format("jdbc")
  .option("url", "<url>")
  .option("query", "SELECT * FROM oracle_test_table)")
  .option("user", "<user>")
  .option("password", "<password>")
  .option("driver", "oracle.jdbc.driver.OracleDriver")
  .load()
df.show()

You will see this error Message:

java.sql.SQLSyntaxErrorException: ORA-00911: invalid character
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
      at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
      at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)
      at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)

Version

Apache Spark 2.4.3 and below; Databricks Runtime 5.3 and below.

Cause

The error is due to a Spark-generated subquery alias (generated with the query option) that does not conform to Oracle Database identifier naming conventions. This bug is tracked in Spark Jira ticket SPARK-27596 .

Solution

This issue is fixed in Apache Spark 2.4.4 and Databricks Runtime 5.4.

For clusters running on earlier versions of Spark or Databricks Runtime, use the dbtable option instead of the query option. The query must be enclosed in parentheses as a subquery.

val df = spark.read
  .format("jdbc")
  .option("url", "<url>")
  .option("dbtable", "(SELECT * FROM oracle_test_table)")
  .option("user", "<user>")
  .option("password", "<password>")
  .option("driver", "oracle.jdbc.driver.OracleDriver")
  .load()
df.show()

You can try the same workaround for other databases when the query option fails.