Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Streaming (AWS)

These articles can help you with Structured Streaming and Spark Streaming (the legacy Apache Spark streaming feature).

21 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • All articles
  • Streaming (AWS)

Append output is not supported without a watermark

Append output mode is not supported on aggregated DataFrames without a watermark....

Last updated: May 17th, 2022 by Adam Pavlacka

Apache Spark DStream is not supported

DStreams are not supported in Databricks. Migrate from DStream API to Structured Streaming....

Last updated: May 17th, 2022 by Adam Pavlacka

Streaming with File Sink: Problems with recovery if you change checkpoint or output directories

Learn how to resolve issues that occur with recovery if you change checkpoint or output directories when streaming with File Sink....

Last updated: May 17th, 2022 by Adam Pavlacka

Get the path of files consumed by Auto Loader

Get the path and filename of all files consumed by Auto Loader and write them out as a new column....

Last updated: May 18th, 2022 by Adam Pavlacka

How to set up Apache Kafka on Databricks

Learn how to set up Apache Kafka on Databricks....

Last updated: May 18th, 2022 by Adam Pavlacka

Handling partition column values while using an SQS queue as a streaming source

...

Last updated: May 18th, 2022 by Adam Pavlacka

How to restart a structured streaming query from last written offset

Learn how to restart a structured streaming query from the last written offset....

Last updated: May 18th, 2022 by Adam Pavlacka

How to switch a SNS streaming job to a new SQS queue

...

Last updated: May 18th, 2022 by Adam Pavlacka

Kafka error: No resolvable bootstrap urls

A 'No resolvable bootstrap urls' error occurs when you try to read or write data to a Kafka stream....

Last updated: May 18th, 2022 by Adam Pavlacka

readStream() is not whitelisted error when running a query

readStream() is not whitelisted error on clusters that have table access control enabled....

Last updated: May 19th, 2022 by mathan.pillai

Checkpoint files not being deleted when using display()

Learn how to prevent display(streamingDF) checkpoint files from using a large amount of storage....

Last updated: May 19th, 2022 by Adam Pavlacka

Checkpoint files not being deleted when using foreachBatch()

Learn how to prevent foreachBatch() checkpoint files from using a large amount of storage....

Last updated: May 19th, 2022 by Adam Pavlacka

Conflicting directory structures error

You should use distinct paths in the storage location, otherwise conflicting directory structures may result in an error....

Last updated: May 19th, 2022 by ashish

RocksDB fails to acquire a lock

When using RocksDB as a state store, you may need to increase the acquire timeout in the SQL config....

Last updated: February 25th, 2023 by Adam Pavlacka

Stream XML files using an auto-loader

Stream XML files on Databricks by combining the auto-loading features of the Spark batch API with the OSS library Spark-XML....

Last updated: May 19th, 2022 by Adam Pavlacka

Streaming job using Kinesis connector fails

A streaming job writing to a Kinesis sink fails with out of memory error because HTTP clients are not getting terminated....

Last updated: May 19th, 2022 by ashish

Streaming job gets stuck writing to checkpoint

Streaming job appears to be stuck even though no error is thrown. You are using DBFS for checkpoint storage, but it has filled up....

Last updated: May 19th, 2022 by Jose Gonzalez

Explicit path to data or a defined schema required for Auto loader

If you do not specify an explicit path to your data or define your data schema, you get an IllegalArgumentException error when you start an Auto loader job....

Last updated: October 12th, 2022 by Jose Gonzalez

Optimize streaming transactions with .trigger

Use .trigger to define the storage update interval. A higher value reduces the number of storage transactions....

Last updated: October 26th, 2022 by chetan.kardekar

Structured streaming jobs slow down on every 10th batch

Automatic compaction of the metadata folder can slow down structured streaming jobs....

Last updated: October 28th, 2022 by gopinath.chandrasekaran

Get last modification time for all files in Auto Loader and batch jobs

Define a UDF to list all files in the path and return the last modification time for each one....

Last updated: December 1st, 2022 by DD Sharma


© Databricks 2022-2023. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Notice (Updated) | Terms of Use | Your Privacy Choices | Your California Privacy Rights Privacy Rights icon

Definition by Author

0
0