Streaming (GCP) - Databricks

Append output is not supported without a watermark

Append output mode is not supported on aggregated DataFrames without a watermark....

Last updated: May 17th, 2022 by Adam Pavlacka

Apache Spark DStream is not supported

DStreams are not supported in Databricks. Migrate from DStream API to Structured Streaming....

Last updated: May 17th, 2022 by Adam Pavlacka

Streaming with File Sink: Problems with recovery if you change checkpoint or output directories

Learn how to resolve issues that occur with recovery if you change checkpoint or output directories when streaming with File Sink....

Last updated: May 17th, 2022 by Adam Pavlacka

Get the path of files consumed by Auto Loader

Get the path and filename of all files consumed by Auto Loader and write them out as a new column....

Last updated: May 18th, 2022 by Adam Pavlacka

How to restart a structured streaming query from last written offset

Learn how to restart a structured streaming query from the last written offset....

Last updated: May 18th, 2022 by Adam Pavlacka

Kafka error: No resolvable bootstrap urls

A 'No resolvable bootstrap urls' error occurs when you try to read or write data to a Kafka stream....

Last updated: May 18th, 2022 by Adam Pavlacka

readStream() is not whitelisted error when running a query

readStream() is not whitelisted error on clusters that have table access control enabled....

Last updated: May 19th, 2022 by mathan.pillai

Checkpoint files not being deleted when using display()

Learn how to prevent display(streamingDF) checkpoint files from using a large amount of storage....

Last updated: May 19th, 2022 by Adam Pavlacka

Checkpoint files not being deleted when using foreachBatch()

Learn how to prevent foreachBatch() checkpoint files from using a large amount of storage....

Last updated: May 19th, 2022 by Adam Pavlacka

Conflicting directory structures error

You should use distinct paths in the storage location, otherwise conflicting directory structures may result in an error....

Last updated: May 19th, 2022 by ashish

RocksDB fails to acquire a lock

When using RocksDB as a state store, you may need to increase the acquire timeout in the SQL config....

Last updated: February 25th, 2023 by Adam Pavlacka

Streaming job gets stuck writing to checkpoint

Streaming job appears to be stuck even though no error is thrown. You are using DBFS for checkpoint storage, but it has filled up....

Last updated: May 19th, 2022 by Jose Gonzalez

Explicit path to data or a defined schema required for Auto loader

If you do not specify an explicit path to your data or define your data schema, you get an IllegalArgumentException error when you start an Auto loader job....

Last updated: October 12th, 2022 by Jose Gonzalez

Optimize streaming transactions with .trigger

Use .trigger to define the storage update interval. A higher value reduces the number of storage transactions....

Last updated: October 26th, 2022 by chetan.kardekar

Structured streaming jobs slow down on every 10th batch

Automatic compaction of the metadata folder can slow down structured streaming jobs....

Last updated: October 28th, 2022 by gopinath.chandrasekaran

Get last modification time for all files in Auto Loader and batch jobs

Define a UDF to list all files in the path and return the last modification time for each one....

Last updated: December 1st, 2022 by DD Sharma

Stream to stream join failure

Avoid using a memory sink when running streaming queries with stream to stream join....

Last updated: January 18th, 2024 by harikrishnan.kunhumveettil

Offset reprocessing issues in streaming queries with a Kafka source

Resolve Kafka offset reprocessing issues in Structured Streaming by using a new checkpoint directory....

Last updated: January 19th, 2024 by harikrishnan.kunhumveettil

Autoloader job fails with a URISyntaxException error due to invalid characters in filenames

When using Directory listing mode you should not process files with colons in the filename. ...

Last updated: January 19th, 2024 by harikrishnan.kunhumveettil

Auto Loader streaming job failure with schema inference error

To selectively read a specific type of file using Auto Loader, use the pathGlobFilter option....

Last updated: February 29th, 2024 by harikrishnan.kunhumveettil

Auto Loader streaming query failure with unknownFieldException error

Use schema evolution to avoid streaming query failures when new columns are added to your data....

Last updated: February 29th, 2024 by harikrishnan.kunhumveettil

Casting string to date/timestamp in DLT pipeline does not throw an error

Configure the Delta Live Tables pipeline to enforce ANSI SQL compliance. ...

Last updated: September 23rd, 2024 by anudeep.konaboina

Auto Loader fails to pick up new files when using directory listing mode

Use file notification mode or disable incremental listing....

Last updated: September 12th, 2024 by brock.baurer

Incorrect input record count in Apache Spark streaming application logs/micro-batch metrics

Optimize actions on the DataFrame within the foreachBatch function. ...

Last updated: September 12th, 2024 by potnuru.siva

Upgrading to 14.3 LTS gives the error "com.databricks.sql.cloudfiles.errors.CloudFilesIllegalArgumentException"

Choose to configure either manually or via schema evolution....

Last updated: September 12th, 2024 by lucas.rocha

Streaming application missing data from a Delta table when writing to a given destination

Restart a streaming query on a new checkpoint folder with startingVersion option pointing to the next Delta version (X+1)....

Last updated: September 23rd, 2024 by gopinath.chandrasekaran

How to efficiently manage state store files in Apache Spark streaming applications

Control the lifecycle of state store files using streaming configurations...

Last updated: September 10th, 2024 by lingeswaran.radhakrishnan

Auto Loader failures with java.io.FileNotFoundException for SST and log files

Use a separate checkpoint folder outside of the Delta directory....

Last updated: November 4th, 2024 by kuldeep.mishra

Stateful Structured Streaming jobs fail after making changes to stateful operations

Avoid changes to stateful operations between restarts....

Last updated: November 17th, 2024 by brock.baurer

Structured Streaming job fails with a Streaming Query Exception when a schema changes in the source table

Enable schema tracking and set allowSourceColumnRenameAndDrop to true....

Last updated: December 2nd, 2024 by shanmugavel.chandrakasu

Duplicates appearing in Auto Loader with file notification feature despite set backfill interval

Remove the allowOverwrites configuration or implement a deduplicate logic....

Last updated: December 12th, 2024 by raul.goncalves

writeStream/readStream leads to an error when the schema contains “NullType”

Replace NullType with a literal None value and then cast it to StringType. ...

Last updated: December 24th, 2024 by G Yashwanth Kiran

Resumed streaming job fails after pause with StreamingQueryException error

Avoid pausing streaming jobs for longer than the delta.logRetentionDuration value, or restart the stream with a new checkpoint location....

Last updated: December 26th, 2024 by jayant.sharma

DLT pipeline is very slow when using Auto Loader and a Glob filter

Configure Auto Loader with file notification mode....

Last updated: March 25th, 2025 by lucas.rocha

Getting an InconsistentReadException error after updating to Databricks Runtime 13.3 LTS or above

Disable file status caching to reduce the time between file status checks. ...

Last updated: January 25th, 2025 by Raphael Freixo

Using glob patterns for directory filtering impacting Auto Loader performance

Use a more specific root path to reduce the scope of the initial scan....

Last updated: January 29th, 2025 by avi.yehuda

[CONCURRENT_QUERY] Error on Auto Loader job

Set the Auto Loader job to be configured to run in "continuous" mode instead of "available now" mode....

Last updated: January 31st, 2025 by Guilherme Leite

Structured Streaming workflow reading data from CDC is failing

Set spark.databricks.streaming.stateStore.stateSchemaCheck.ignoreNullCompatibility to true....

Last updated: January 31st, 2025 by sidhant.sahu

Stateful Streaming query failing with SparkSecurityException error after restarting on a shared cluster or serverless

Switch the cluster access mode....

Last updated: February 4th, 2025 by shanmugavel.chandrakasu

Receiving com.databricks.sql.io.FileReadException: Error while reading file on streaming queries

Ensure the delta.deletedFileRetentionDuration value is longer than the time it takes your query to complete....

Last updated: February 7th, 2025 by raphael.balogo

Auto Loader (file notification mode) fails to identify new files from the cloud queue service

Ensure that messages in the cloud queue are of the expected format....

Last updated: March 12th, 2025 by brock.baurer

Unable to use fields with qualifiers in the DLT Apply Changes API

Ensure that qualifiers are extracted prior to reference in the Apply Changes definition....

Last updated: March 19th, 2025 by brock.baurer

NoSuchMethodError import failure of Protobuf-java when trying to run Apache Spark workflow

Shade the Protobuf class in the JAR....

Last updated: April 24th, 2025 by Raphael Freixo

Streaming job failing with error "org.rocksdb.RocksDBException: Too many open files"

Make changes to Apache Spark configurations applicable to Auto Loader, state store, or both....

Last updated: May 28th, 2025 by jayant.sharma

Structured Streaming does not process batch size reduction after a failed transaction

Delete the uncommitted offset file from the checkpoint location and restart the stream....

Last updated: July 1st, 2025 by Tarun Sanjeev

Timeout error when integrating Kafka with Apache Spark Structured Streaming

Upgrade Kafka brokers....

Last updated: July 10th, 2025 by saritha.shivakumar

How to retrieve DLT pipeline details using Python and the Databricks API

Use the provided code....

Last updated: July 24th, 2025 by anudeep.konaboina

Streaming job failing with “Job terminated with exception” error

Restart the stream with a new checkpoint, or for DLT pipelines do a full refresh....

Last updated: July 24th, 2025 by anudeep.konaboina

Databricks Help Center

Append output is not supported without a watermark

Apache Spark DStream is not supported

Streaming with File Sink: Problems with recovery if you change checkpoint or output directories

Get the path of files consumed by Auto Loader

How to restart a structured streaming query from last written offset

Kafka error: No resolvable bootstrap urls

readStream() is not whitelisted error when running a query

Checkpoint files not being deleted when using display()

Checkpoint files not being deleted when using foreachBatch()

Conflicting directory structures error

RocksDB fails to acquire a lock

Streaming job gets stuck writing to checkpoint

Explicit path to data or a defined schema required for Auto loader

Optimize streaming transactions with .trigger

Structured streaming jobs slow down on every 10th batch

Get last modification time for all files in Auto Loader and batch jobs

Stream to stream join failure

Offset reprocessing issues in streaming queries with a Kafka source

Autoloader job fails with a URISyntaxException error due to invalid characters in filenames

Auto Loader streaming job failure with schema inference error

Auto Loader streaming query failure with unknownFieldException error

Casting string to date/timestamp in DLT pipeline does not throw an error

Auto Loader fails to pick up new files when using directory listing mode

Incorrect input record count in Apache Spark streaming application logs/micro-batch metrics

Upgrading to 14.3 LTS gives the error "com.databricks.sql.cloudfiles.errors.CloudFilesIllegalArgumentException"

Streaming application missing data from a Delta table when writing to a given destination

How to efficiently manage state store files in Apache Spark streaming applications

Auto Loader failures with java.io.FileNotFoundException for SST and log files

Stateful Structured Streaming jobs fail after making changes to stateful operations

Structured Streaming job fails with a Streaming Query Exception when a schema changes in the source table

Duplicates appearing in Auto Loader with file notification feature despite set backfill interval

writeStream/readStream leads to an error when the schema contains “NullType”

Resumed streaming job fails after pause with StreamingQueryException error

DLT pipeline is very slow when using Auto Loader and a Glob filter

Getting an InconsistentReadException error after updating to Databricks Runtime 13.3 LTS or above

Using glob patterns for directory filtering impacting Auto Loader performance

[CONCURRENT_QUERY] Error on Auto Loader job

Structured Streaming workflow reading data from CDC is failing

Stateful Streaming query failing with SparkSecurityException error after restarting on a shared cluster or serverless

Receiving com.databricks.sql.io.FileReadException: Error while reading file on streaming queries

Auto Loader (file notification mode) fails to identify new files from the cloud queue service

Unable to use fields with qualifiers in the DLT Apply Changes API

NoSuchMethodError import failure of Protobuf-java when trying to run Apache Spark workflow

Streaming job failing with error "org.rocksdb.RocksDBException: Too many open files"

Structured Streaming does not process batch size reduction after a failed transaction

Timeout error when integrating Kafka with Apache Spark Structured Streaming

How to retrieve DLT pipeline details using Python and the Databricks API

Streaming job failing with “Job terminated with exception” error

Contact Us