Stream to stream join failure

Avoid using a memory sink when running streaming queries with stream to stream join.

Written by harikrishnan.kunhumveettil

Last published at: January 18th, 2024

Problem

You are encountering an error when attempting to display a streaming DataFrame that is derived by performing a stream-stream join. 

Cause

When calling the display method on a structured streaming DataFrame, the default settings utilize complete output mode and a memory sink. However, it's important to note that for stream-stream joins, the complete output mode is not supported. You can only use the append mode for a stream-stream join.

Solution

Ensure that the writeStream method is used to load the data into the sink, and specifically select the append mode as the output mode. 

For more information, refer to the Python example from the Append mode (AWS | Azure | GCP) documentation.

%python

(events.writeStream
   .format("delta")
   .outputMode("append")
   .option("checkpointLocation", "/tmp/delta/_checkpoints/")
   .start("/delta/events")
)