Problem
You want to control the micro-batch size on a streaming table, which is created in the same Delta Live Tables (DLT) pipeline using rate limiters, but it is not clear how to achieve this in DLT.
Cause
The dlt.readStream()
function in Delta Live Tables (DLT) does not directly support the rate limit configuration maxBytesPerTrigger
option.
This option is typically used with spark.readStream()
to limit the amount of data read in each micro-batch during streaming.
Solution
Use the rate limiters along with the keyword LIVE.
Example
How to use the `maxFilesPerTrigger` option in DLT.
%python
import dlt
@dlt.table
def dlt_test_target3():
return spark.readStream. \
option("maxFilesPerTrigger",1). \
table("source_db.streamingSource")
@dlt.table
def dlt_test_target4():
return spark.readStream. \
option("maxFilesPerTrigger", 1). \
table("LIVE.dlt_test_target3")
In the above example, dlt_test_target3
is defined as a streaming table within the DLT and is used as a source for another streaming table dlt_test_target4
. Provide the LIVE
keyword on the source table in spark.readStream
to apply rate limiters using maxFilesPerTrigger
.
Important
Note: This code applies to DLT processes using streaming tables as sources AND pipelines configured in continuous mode (not in triggered mode).