Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Delta Lake (GCP)

These articles can help you with Delta Lake.

12 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • Google Cloud Platform
  • Delta Lake (GCP)

Compare two versions of a Delta table

Delta Lake supports time travel, which allows you to query an older snapshot of a Delta table. One common use case is to compare two versions of a Delta table in order to identify what changed. For more details on time travel, please review the Delta Lake time travel documentation (AWS | Azure | GCP). Identify all differences You can use a SQL SELEC...

Last updated: May 10th, 2022 by mathan.pillai

Delta Merge cannot resolve nested field

Problem You are attempting a Delta Merge with automatic schema evolution, but it fails with a Delta Merge: cannot resolve 'field' due to data type mismatch error message. Cause This can happen if you have made changes to the nested column fields. For example, assume we have a column called Address with the fields streetName, houseNumber, and city ne...

Last updated: May 10th, 2022 by Adam Pavlacka

How Delta cache behaves on an autoscaling cluster

This article is about how Delta cache (AWS | Azure | GCP) behaves on an auto-scaling cluster, which removes or adds nodes as needed. When a cluster downscales and terminates nodes: A Delta cache behaves in the same way as an RDD cache. Whenever a node goes down, all of the cached data in that particular node is lost. Delta cache data is not moved fr...

Last updated: May 10th, 2022 by Adam Pavlacka

How to improve performance of Delta Lake MERGE INTO queries using partition pruning

This article explains how to trigger partition pruning in Delta Lake MERGE INTO (AWS | Azure | GCP) queries from Databricks. Partition pruning is an optimization technique to limit the number of partitions that are inspected by a query. Discussion MERGE INTO is an expensive operation when used with Delta tables. If you don’t partition the underlying...

Last updated: May 10th, 2022 by Adam Pavlacka

Best practices for dropping a managed Delta Lake table

Regardless of how you drop a managed table, it can take a significant amount of time, depending on the data size. Delta Lake managed tables in particular contain a lot of metadata in the form of transaction logs, and they can contain duplicate data files. If a Delta table has been in use for a long time, it can accumulate a very large amount of data...

Last updated: May 10th, 2022 by Adam Pavlacka

How to populate or update columns in an existing Delta table

Problem You have an existing Delta table, with a few empty columns. You need to populate or update those columns with data from a raw Parquet file. Solution In this example, there is a customers table, which is an existing Delta table. It has an address column with missing values. The updated data exists in Parquet format. Create a DataFrame from th...

Last updated: May 10th, 2022 by Adam Pavlacka

Identify duplicate data on append operations

A common issue when performing append operations on Delta tables is duplicate data. For example, assume user 1 performs a write operation on Delta table A. At the same time, user 2 performs an append operation on Delta table A. This can lead to duplicate records in the table. In this article, we review basic troubleshooting steps that you can use to...

Last updated: May 10th, 2022 by chetan.kardekar

Optimize a Delta sink in a structured streaming application

You are using a Delta table as the sink for your structured streaming application and you want to optimize the Delta table so that queries are faster. If your structured streaming application has a very frequent trigger interval, it may not create sufficient files that are eligible for compaction in each microbatch. The autoOptimize operation compac...

Last updated: May 10th, 2022 by mathan.pillai

Unable to cast string to varchar

Problem You are trying to cast a string type column to varchar but it isn’t working. Info The varchar data type (AWS | Azure | GCP) is available in Databricks Runtime 8.0 and above. Create a simple Delta table, with one column as type string.%sql CREATE OR REPLACE TABLE delta_table1 (`col1` string) USING DELTA; Use SHOW TABLE on the newly created ta...

Last updated: May 10th, 2022 by DD Sharma

Vaccuming with zero retention results in data loss

Problem You add data to a Delta table, but the data disappears without warning. There is no obvious error message. Cause This can happen when spark.databricks.delta.retentionDurationCheck.enabled is set to false and VACUUM is configured to retain 0 hours. %sql VACUUM <name-of-delta-table> RETAIN 0 HOURS When VACUUM is configured to retain 0 ho...

Last updated: May 10th, 2022 by DD Sharma

Z-Ordering will be ineffective, not collecting stats

Problem You are trying to optimize a Delta table by Z-Ordering and receive an error about not collecting stats for the columns. AnalysisException: Z-Ordering on [col1, col2] will be ineffective, because we currently do not collect stats for these columns. Info Please review Z-Ordering (multi-dimensional clustering) (AWS | Azure | GCP) for more infor...

Last updated: May 10th, 2022 by mathan.pillai

Change cluster config for Delta Live Table pipeline

Problem You are using Delta Live Tables and want to change the cluster configuration. You create a pipeline, but only have options to enable or disable Photon and select the number of workers. Cause When you create a Delta Live Table pipeline, most parameters are configured with default values. These values cannot be configured before the pipeline i...

Last updated: July 1st, 2022 by pratik.bhawsar


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0