readStream() is not whitelisted error when running a query

readStream() is not whitelisted error on clusters that have table access control enabled.

Written by mathan.pillai

Last published at: May 19th, 2022

Problem

You have table access control (AWS | Azure | GCP) enabled on your cluster.

You are trying to run a structured streaming query and get and error message.

py4j.security.Py4JSecurityException: Method public org.apache.spark.sql.streaming.DataStreamReader org.apache.spark.sql.SQLContext.readStream() is not whitelisted on class class org.apache.spark.sql.SQLContext

Cause

Streaming is not supported on clusters that have table access control enabled.

Access control allows you to set permissions for data objects on a cluster. It requires user interaction to validate and refresh credentials.

Because streaming queries run continuously, it is not supported on clusters with table access control.

Solution

You should use a cluster that does not have table access control enabled for streaming queries.