Problem
When you try to run queries involving interaction with event logs and multiple tables within the same notebook cell on an interactive cluster, you receive an error message.
UnsupportedOperationException: Cannot read more than one event logs in the same query
Cause
Interactive clusters run Spark-Connect, which allows you to execute DataFrame operations or queries on Apache Spark clusters from remote environments like IDEs, notebooks, or applications.
Spark-Connect does not allow commands which query multiple tables within the same cell. As a result, when Spark-Connect encounters such queries, it processes the entire cell as a single query instead.
Solution
Either split queries into separate notebook cells, or use serverless compute.
Split queries into separate cells
The following example demonstrates how to split the queries so each notebook cell is treated as a separate query.
Cell 1
sql = f"CREATE OR REPLACE TABLE {<your-target-db>}.event_log_{<your-table>.<your-table-name-query-1>} AS SELECT * FROM event_log(table({<your-source-db>}.{<your-table>.<your-table-name-source>}))"
Cell 2
sql = f"CREATE OR REPLACE TABLE {<your-target-db>}.event_log_{<your-table>.<your-table-name-query-2>} AS SELECT * FROM event_log(table({<your-source-db>}.{<your-table>.<your-table-name-source>}))"
Use serverless compute
Alternatively, you can use serverless compute. In serverless, Spark-Connect executes queries independently. For more information, refer to the Serverless compute release notes (AWS | Azure | GCP) documentation.
Cell 1
tables = [<your-table1>, <your-table2>, <your-table3>]
for table in tables:
sql = f"create or replace table {<your-target-db>}.event_log_{<your-table>.<your-table-name>} as select * from event_log(table({<your-source-db>}.{<your-table>.<your-table-name>}))"
spark.sql(sql)