Unclear how to control micro-batch size on a streaming table in Delta Live Tables (DLT)

Use the rate limiters along with the keyword LIVE.

Written by potnuru.siva

Last published at: September 9th, 2024

Problem 

You want to control the micro-batch size on a streaming table, which is created in the same Delta Live Tables (DLT) pipeline using rate limiters, but it is not clear how to achieve this in DLT.

Cause 

The dlt.readStream() function in Delta Live Tables (DLT) does not directly support the rate limit configuration maxBytesPerTrigger option. 

 

This option is typically used with spark.readStream() to limit the amount of data read in each micro-batch during streaming.

Solution

Use the rate limiters along with the keyword LIVE. 

Example

How to use the `maxFilesPerTrigger` option in DLT.

%python
import dlt
@dlt.table
def dlt_test_target3():
	return spark.readStream. \
		option("maxFilesPerTrigger",1). \
		table("source_db.streamingSource")
@dlt.table
def dlt_test_target4():
	return spark.readStream. \
		option("maxFilesPerTrigger", 1). \
		table("LIVE.dlt_test_target3")

 

In the above example, dlt_test_target3 is defined as a streaming table within the DLT and is used as a source for another streaming table dlt_test_target4. Provide the LIVE keyword on the source table in spark.readStream to apply rate limiters using maxFilesPerTrigger.

 

Important

Note: This code applies to DLT processes using streaming tables as sources AND pipelines configured in continuous mode (not in triggered mode).