Databricks Help Center

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Delta Lake

These articles can help you with Delta Lake.

73 Articles in this category

  • Home
  • All articles
  • Delta Lake

Cannot delete data using JDBC in Eclipse

Delete operations on a Delta table fail with a SparkJDBCDriver error when using JDBC in an Eclipse local environment....

Last updated: May 10th, 2022 by annapurna.hiriyur

Compare two versions of a Delta table

Use time travel to compare two versions of a Delta table....

Last updated: May 10th, 2022 by mathan.pillai

Converting from Parquet to Delta Lake fails

Converting a file from Parquet to Delta Lake fails with a partition error when you have a subdirectory. Expecting 0 partition column(s), but found 1 partition column(s)...

Last updated: May 10th, 2022 by Jose Gonzalez

Delta Merge cannot resolve nested field

Delta Merge fails with a `Delta Merge cannot resolve 'field' due to data type mismatch` error message....

Last updated: May 10th, 2022 by Adam Pavlacka

Delete your streaming query checkpoint and restart

Delta table doesn't exist. Please delete your streaming query checkpoint and restart....

Last updated: May 10th, 2022 by Adam Pavlacka

How Delta cache behaves on an autoscaling cluster

Learn how Delta cache behaves on an autoscaling cluster....

Last updated: May 10th, 2022 by Adam Pavlacka

How to improve performance of Delta Lake MERGE INTO queries using partition pruning

Learn how to use partition pruning to improve the performance of Delta Lake MERGE INTO queries....

Last updated: June 1st, 2023 by Adam Pavlacka

Best practices for dropping a managed Delta Lake table

Learn the best practices for dropping a managed Delta Lake table....

Last updated: May 10th, 2022 by Adam Pavlacka

HIVE_CURSOR_ERROR when reading a table in Athena

When you try to read a table in Athena, the select query returns a HIVE_CURSOR_ERROR message....

Last updated: May 10th, 2022 by annapurna.hiriyur

Access denied when writing Delta Lake tables to S3

Learn how to resolve an access denied 403 Forbidden error when writing Delta Lake tables to S3....

Last updated: May 10th, 2022 by Adam Pavlacka

Delta Lake write job fails with java.lang.UnsupportedOperationException

Learn how to prevent java.lang.UnsupportedOperationException in Delta Lake write jobs....

Last updated: May 10th, 2022 by Adam Pavlacka

How to populate or update columns in an existing Delta table

Learn how to populate or update columns in an existing Delta table....

Last updated: May 10th, 2022 by Adam Pavlacka

Identify duplicate data on append operations

...

Last updated: May 10th, 2022 by chetan.kardekar

Object lock error when writing Delta Lake tables to S3

Delta Lake does not support S3 buckets with object lock enabled. com.amazonaws.services.s3.model.AmazonS3Exception...

Last updated: May 10th, 2022 by ashritha.laxminarayana

Optimize a Delta sink in a structured streaming application

Optimize your Delta sink by using a mod value on the batchId to optimize when foreachBatch runs....

Last updated: May 10th, 2022 by mathan.pillai

Delta Lake UPDATE query fails with IllegalState exception

Learn how to resolve an issue with Delta Lake UPDATE, DELETE, or MERGE queries that use Python UDFs....

Last updated: May 10th, 2022 by Adam Pavlacka

Unable to cast string to varchar

Use varchar type in Databricks Runtime 8.0 and above. It can only be used in table schema. It cannot be used in functions or operators....

Last updated: May 10th, 2022 by DD Sharma

Vaccuming with zero retention results in data loss

Do not disable spark.databricks.delta.retentionDurationCheck.enabled and run vacuum with retention zero to avoid data loss....

Last updated: October 7th, 2022 by DD Sharma

Z-Ordering will be ineffective, not collecting stats

Z-Ordering is ineffective, error about not collecting stats. Reorder table so the columns you want to optimize on are within the first 32 columns....

Last updated: May 10th, 2022 by mathan.pillai

Change cluster config for Delta Live Table pipeline

Customize the cluster configuration when using a Delta Live Table pipeline....

Last updated: July 1st, 2022 by pratik.bhawsar

Different tables with same data generate different plans when used in same query

Ensure that tables with the same data generate the same physical plans with Spark SQL....

Last updated: October 14th, 2022 by deepak.bhutada

Allow spaces and special characters in nested column names with Delta tables

Upgrade to Databricks Runtime 10.2 or later and use column mapping mode to allow spaces and special characters in column names....

Last updated: October 26th, 2022 by shanmugavel.chandrakasu

Delta writing empty files when source is empty

Delta can write empty files under Databricks Runtime 7.3 LTS. You should upgrade to Databricks Runtime 9.1 LTS or above to resolve the issue....

Last updated: December 2nd, 2022 by Rajeev kannan Thangaiah

Delta Live Tables pipelines are not running VACUUM automatically

You must have a maintenance cluster defined for VACUUM to run automatically....

Last updated: February 2nd, 2023 by priyanka.biswas

VACUUM best practices on Delta Lake

Learn best practices for using, and troubleshooting, VACUUM on Delta Lake....

Last updated: February 3rd, 2023 by mathan.pillai

OPTIMIZE is only supported for Delta tables error on Delta Lake

Use CREATE OR REPLACE TABLE when moving Delta tables from one storage location to another....

Last updated: February 3rd, 2023 by mathan.pillai

FileReadException when reading a Delta table

A FileReadException error occurs when you attempt to read from a Delta table. The underlying data has been deleted, or the storage blob was unmounted during a write....

Last updated: February 23rd, 2023 by Adam Pavlacka

Programmatically determine if a table is a Delta table or not

Use Python code in a Databricks notebook to determine if a table is a Delta table or not....

Last updated: March 16th, 2023 by mounika.tarigopula

RESOURCE_LIMIT_EXCEEDED error when querying a Delta Sharing table

Delta Sharing has limits on the metadata size of a shared table. If you exceed these limits it generates an error....

Last updated: April 19th, 2023 by Rajeev kannan Thangaiah

Found duplicate columns error blocks creation of a Delta table

Duplicate column names are not allowed in Delta tables....

Last updated: July 28th, 2023 by deepak.bhutada

Hive-style partitions not found on Delta table after enabling column mapping mode

Delta Lake column mapping does not support Hive-style partitions....

Last updated: February 21st, 2024 by Jose Gonzalez

Dropping and recreating Delta tables results in a DeltaVersionsNotContiguousException error

Instead of dropping and recreating Delta tables, use the CREATE OR REPLACE command....

Last updated: March 24th, 2025 by sidhant.sahu

The delta.retentionDurationCheck property is not recognized when using serverless compute

Use VACUUM or table properties to handle retention instead. ...

Last updated: September 23rd, 2024 by Rajeev kannan Thangaiah

"AnalysisException: Incompatible Format Detected" error when writing to OpenSearch

Make sure there is no _delta_log folder in your root directory....

Last updated: September 23rd, 2024 by kuldeep.mishra

AnalysisException error due to a schema mismatch

Modify the write command and set the mergeSchema property to true....

Last updated: September 23rd, 2024 by ram.sankarasubramanian

Timestamp change to underlying Apache Parquet/change data files while using Change Data Capture (CDC)

For timestamp-based queries, ensure that the original file timestamps are preserved during the migration process....

Last updated: September 12th, 2024 by raphael.balogo

Unable to read Delta table with deletion vectors

Use the cluster with Databricks Runtime 12.2 LTS - 15.3 to query all deletion-vector-enabled Delta tables....

Last updated: September 12th, 2024 by Ravivarma S

VACUUM operations not performing even after enabling predictive optimization

Create a new deletion operation to trigger VACUUM after enabling predictive optimization....

Last updated: March 28th, 2025 by Sahil Singh

Unknown Apache Spark internal error when running Delta table queries

Reorganize the folder structure for the specific partition causing the problem. ...

Last updated: November 4th, 2024 by Guilherme Leite

Running OPTIMIZE on Delta tables causing ConcurrentDeleteDeleteException error

...

Last updated: December 10th, 2024 by Vidhi Khaitan

DELTA_CLUSTERING_COLUMN_MISSING_STATS error when attempting to define liquid clustering for a delta table

Ensure that you have generated Delta statistics for the columns used as clustering keys....

Last updated: December 11th, 2024 by jessica.santos

INSERT operation fails while trying to execute multiple concurrent INSERT or MERGE operations to append data

Make sure the isolation levels are correctly set or refactor to remove conflicts....

Last updated: December 12th, 2024 by caio.cominato

You do not use deletion vectors, but see a file named deletion vector in your data path

It is an artifact of low shuffle merge and is removed on the next VACUUM run....

Last updated: December 13th, 2024 by avi.yehuda

Databricks Runtime is not able to read data in a format other than Delta

Delete the transactional log folder or move it to a different location. ...

Last updated: January 25th, 2025 by sidhant.sahu

Running a Python UDF fails with permission error

Ask function owner to grant EXECUTE permission....

Last updated: December 23rd, 2024 by shubham.bhusate

COUNT operation on a DataFrame returning zero or incorrect number of records

Schedule operations to run sequentially, save the DataFrame to a checkpoint, and/or use snapshot isolation....

Last updated: December 23rd, 2024 by nelavelli.durganagajahnavi

Error [DELTA_CLUSTERING_SHOW_CREATE_TABLE_WITHOUT_CLUSTERING_COLUMNS] when running SHOW CREATE TABLE command

Upgrade to Databricks Runtime 15 or above or disable the liquid clustering table feature....

Last updated: December 23rd, 2024 by manikandan.ganesan

FileReadException error when trying to run streaming job reading from system tables

Increase job frequency or the maxVersionsPerRpc....

Last updated: January 10th, 2025 by lucas.rocha

DeltaFileNotFoundException when reading a table

Ensure that Delta log files are not getting deleted prematurely....

Last updated: January 14th, 2025 by lucas.rocha

Error FileNotFoundException while streaming job or reading Delta table even with ignoremissingfiles set

Use the FSCK repair command to synchronize the metadata with the actual data files....

Last updated: January 16th, 2025 by mounika.tarigopula

Slow S3 API responses when using Delta Lake with versioning-enabled S3 buckets

Disable S3 bucket versioning....

Last updated: January 17th, 2025 by Raghavan Vaidhyaraman

Syntax error when running vacuum with USING INVENTORY command

Upgrade your Databricks Runtime to version 15.2 or above. ...

Last updated: January 22nd, 2025 by jessica.santos

How to get the full size of a Delta table or partition

Use a Scala command and pass in either the root path or the path to the partition....

Last updated: January 22nd, 2025 by saritha.shivakumar

Error when trying to copy datasets to different regions using Delta Sharing

Split your intervals into separate columns for day, hour, minute and second; convert the interval to a string, or convert the entire interval to seconds and store as an integer. ...

Last updated: January 25th, 2025 by Rajeev kannan Thangaiah

ETL process fails to process a column and throws error Row group size has overflowed

Reduce the default row group size and increase the frequency of size checks....

Last updated: January 25th, 2025 by Raphael Freixo

Error when trying to perform maintenance on a Delta table

Avoid manual maintenance, or disable predictive optimization if manual maintenance is necessary....

Last updated: January 29th, 2025 by avi.yehuda

InvalidSchemaException error when trying to insert data into a Delta table

Define a field type for any fields that use a StructType within a StructField....

Last updated: January 30th, 2025 by lucas.rocha

Job fails with ExecutorLostFailure error due to excessive garbage collection (GC)

Broadcast the smaller table instead of the larger one....

Last updated: January 31st, 2025 by Rajeev kannan Thangaiah

Parquet table last modification retrieval returns NULL

List the files within the parquet table path and sort them by the modificationTime column. ...

Last updated: February 7th, 2025 by Shyamprasad Miryala

Time travel SELECT query works on older dates even after VACUUM

This is expected behavior but you can also test that your VACUUM command ran successfully. ...

Last updated: February 7th, 2025 by Shyamprasad Miryala

Unable to exclude columns from a table based on specific strings in the comments

Leverage DESCRIBE TABLE to retrieve and filter on the metadata which includes comments....

Last updated: February 7th, 2025 by Shyamprasad Miryala

Delta table as a streaming source returns error DELTA_FILE_NOT_FOUND_DETAILED even though no user or lifecycle rule has deleted files

Consume the source table from scratch, consume from a specific version, or load data using a specific timestamp as filter....

Last updated: February 27th, 2025 by avi.yehuda

Symlink format manifest fails when trying to enable liquid clustering on a table

Disable deletion vectors or avoid using Symlink format manifest, as appropriate for your context. ...

Last updated: March 27th, 2025 by avi.yehuda

DataFrame to Ray Dataset conversions taking a long time to execute

Set spark.task.resource.gpu.amount to 0, modify num_cpus_worker_node, or enable Spark cluster auto-scaling....

Last updated: April 1st, 2025 by Raghavan Vaidhyaraman

Files restored from a Delta table archive are not recognized by Delta with archival support enabled

Temporarily disable archival support or increase the value for the table property delta.timeUntilArchived....

Last updated: April 7th, 2025 by kaushal.vachhani

Job failing with DELTA_CHANGE_DATA_FILE_NOT_FOUND error

Use ignoreMissingFile config or a new checkpoint....

Last updated: April 9th, 2025 by sidhant.sahu

Column statistics missing when running ANALYZE TABLE COMPUTE STATISTICS after ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS

Only run ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS to ensure column statistics remain available. ...

Last updated: April 24th, 2025 by Guilherme Leite

Error “Job aborted due to stage failure” when trying to shallow clone a large table with significant DeltaLog entries

Increase ​​spark.driver.maxResultSize to 6 GB or greater....

Last updated: April 24th, 2025 by caio.cominato

Iceberg metadata not reflecting latest changes made to table, appearing out of sync

Add `MSCK REPAIR TABLE SYNC METADATA` to the end of the ingestion job, or run manually after each ingestion....

Last updated: April 28th, 2025 by sidhant.sahu

Recover a dropped table when a new table is created with the same name

Rename the existing table and then UNDROP the deleted table....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

Losing data while migrating Delta table between workspaces

Use Delta Sharing and then clone the Delta table....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

Find the number of files per partition in a Delta table

Retrieve the partition structure of a Delta table and display the number of data files per partition value....

Last updated: April 28th, 2025 by saritha.shivakumar

Intermittent long-running OPTIMIZE command for liquid clustered table

Use Databricks Runtime 16.2 or above to run the OPTIMIZE command....

Last updated: April 29th, 2025 by manikandan.ganesan

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.


© Databricks 2022-2025. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Notice (Updated) | Terms of Use | Your Privacy Choices | Your California Privacy Rights Privacy Rights icon


Knowledge Base Software powered by Helpjuice

Definition by Author

0
0