Problem
When querying a BigQuery table from a Databricks cluster running on Databricks Runtime versions 16.2 and above, the query fails with a ClassCastException
error.
Example query
SELECT * FROM industry.employees LIMIT 10
Example error
ClassCastException: class org.apache.spark.sql.types.StringType$ cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.StringType$ and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
Cause
You have a table and a column with the same name.
When a SQL query references a table and a column with the same name without being explicit, it leads to ambiguity in BigQuery. This is a known issue with the BigQuery storage adapter.
For more detail and context, refer to the Google forum topic Issue create view when the source table and column have the same name.
Solution
The issue is not on the Databricks side. Databricks ships an OSS connector built by Google.
Google suggests two options to fix this issue. The example code lines continue the hardcoded example from the problem to demonstrate the changes. Change employees
and industry.employees
to reflect your respective column and table names.
The first option is to rename the column with an alias in the query to avoid confusion.
SELECT employees AS emp FROM industry.employees
The second option is to fully specify both the table name and the column name.
SELECT industry.employees.employees FROM industry.employees
If the issue persists despite the suggested steps, contact Google Support through their forums.