Problem
When creating a new temporary view using Apache Spark Connect you encounter an issue.
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `col1` cannot be resolved. Did you mean one of the following? [`test_col`]. SQLSTATE: 42703
This error happens even when you know that the column exists and can be resolved.
Example code
In the following code, the view is defined and then redefined, based on the underlying query.
df = spark.sql("select 'test' as col1")
#create the temporary view
df.createOrReplaceTempView('temp_view')
#use the temporary view and saving with the same name
df = spark.sql("select col1 as test_col from temp_view")
df.createOrReplaceTempView('temp_view')
df.count()
Cause
Temporary views in Spark Connect are lazily analyzed, which means that if there is a change to the temporary view, the change is not validated until the temporary view is called.
Upon being called, the temporary view is evaluated and updated. In this case, as the temporary view was recreated, it does not have reference of previous versions of the temporary view, including columns previously defined. This results in the unresolved column error.
Solution
When working with temporary views, use unique names for each temporary view.
If possible, consider using DataFrames instead of temporary views. For more information, refer to the Tutorial: Load and transform data using Apache Spark DataFrames (AWS | Azure | GCP) documentation.