Databricks Help Center

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Data management

These articles can help you with Datasets, DataFrames, and other ways to structure data using Apache Spark and Databricks.

41 Articles in this category

  • Home
  • All articles
  • Data management

Append to a DataFrame

Learn how to append to a DataFrame in Databricks....

Last updated: September 28th, 2022 by Adam Pavlacka

How to improve performance with bucketing

Learn how to improve Databricks performance by using bucketing....

Last updated: February 29th, 2024 by Adam Pavlacka

How to handle blob data contained in an XML file

Learn how to handle blob data contained in an XML file....

Last updated: March 4th, 2022 by Adam Pavlacka

Simplify chained transformations

Learn how to simplify chained transformations on your DataFrame in Databricks....

Last updated: May 25th, 2022 by Adam Pavlacka

How to dump tables in CSV, JSON, XML, text, or HTML format

Learn how to output tables from Databricks in CSV, JSON, XML, text, or HTML format....

Last updated: May 25th, 2022 by Adam Pavlacka

Get and set Apache Spark configuration properties in a notebook

...

Last updated: December 1st, 2023 by mathan.pillai

Hive UDFs

Learn how to create and use a Hive UDF for Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Prevent duplicated columns when joining two DataFrames

Learn how to prevent duplicated columns when joining two DataFrames in Databricks....

Last updated: October 13th, 2022 by Adam Pavlacka

Revoke all user privileges

Use a regex and a series of for loops to revoke all privileges for a single user....

Last updated: May 31st, 2022 by pavan.kumarchalamcharla

How to handle corrupted Parquet files with different schema

Learn how to read Parquet files with a specific schema using Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

No USAGE permission on database

User does not have USAGE permission on the database....

Last updated: May 31st, 2022 by rakesh.parija

Nulls and empty strings in a partitioned column save as nulls

Learn why nulls and empty strings in a partitioned column save as nulls in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Behavior of the randomSplit method

Learn about inconsistent behaviors when using the randomSplit method in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Job fails when using Spark-Avro to write decimal values to AWS Redshift

Learn how to resolve job failures when writing decimal values to AWS Redshift with Spark-Avro....

Last updated: May 31st, 2022 by Adam Pavlacka

Generate schema from case class

Learn how to generate a schema from a Scala case class....

Last updated: May 31st, 2022 by Adam Pavlacka

How to specify skew hints in dataset and DataFrame-based join commands

Learn how to specify skew hints in Dataset and DataFrame-based join commands in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

How to update nested columns

Learn how to update nested columns in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Incompatible schema in some files

Learn how to resolve incompatible schema in Parquet files with Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Access denied when writing to an S3 bucket using RDD

Learn how to resolve an access denied error when writing to an S3 bucket using RDD....

Last updated: May 31st, 2022 by Adam Pavlacka

Invalid timestamp when loading data into Amazon Redshift

Learn how to resolve an invalid timestamp error when loading data into AWS Redshift....

Last updated: May 31st, 2022 by Adam Pavlacka

Unable to infer schema for ORC error

Apache Spark returns an error for ORC files if no schema is defined when reading from an empty directory or a base path with multiple subfolders....

Last updated: December 1st, 2022 by chandana.koppal

Access files written by Apache Spark on ADLS Gen1

Configure permissions to allow access to files that Apache Spark writes to ADLS Gen1 storage....

Last updated: December 9th, 2022 by dayanand.devarapalli

Object ownership is getting changed on dropping and recreating tables

Use TRUNCATE or REPLACE for tables and ALTER VIEW for views instead of dropping and recreating them....

Last updated: December 15th, 2022 by akash.bhat

User does not have permission SELECT on ANY File

Regular users cannot create tables without permission when access control is enabled....

Last updated: May 16th, 2023 by sivaprasad.cs

Sync fails with [UPGRADE_NOT_SUPPORTED.HIVE_SERDE] Table is not eligible for upgrade from Hive Metastore to Unity Catalog

Convert your Hive SerDE tables to Delta format. ...

Last updated: September 12th, 2024 by akash.bhat

Shared table not accessible in Delta Sharing using Python

Make sure correct access permissions are set and no network restrictions exist....

Last updated: September 12th, 2024 by amrith.v

Parquet table counts not being reflected based on concurrent updates

Manually refresh the table in the notebook where the count was initially taken....

Last updated: September 12th, 2024 by ram.sankarasubramanian

Empty string values convert to NULL values when saving a table as CSV or text-based file format

Use Delta as the target format for CSV files or other text-based data formats....

Last updated: September 12th, 2024 by caio.cominato

'CREATE OR REPLACE' SQL error in a Delta table

Correct the job schedule to ensure that only one query is executed at a time for a specific table....

Last updated: September 23rd, 2024 by lakshay.goel

Increased wait times between micro-batches in Auto Loader

Use file notification mode instead of the directory listing method....

Last updated: September 10th, 2024 by lakshay.goel

Addressing performance issues with over-partitioned Delta tables

Implement liquid clustering for improved performance....

Last updated: October 16th, 2024 by raphael.balogo

Count of corrupt_records returns zero in serverless

Collect records in an array and get the count of the array instead....

Last updated: December 11th, 2024 by lakshay.goel

Error INVALID_TEMP_OBJ_REFERENCE when trying to create a view

Persist the temporary object to a location, then create your view....

Last updated: January 16th, 2025 by lucas.rocha

Error when trying to parse XML in a shared mode cluster using the from_xml()function

Define an XML schema in a Data Definition Language (DDL) string first. ...

Last updated: January 17th, 2025 by Raghavan Vaidhyaraman

Error when trying to create a distributed Ray dataset using from_spark() function

Set spark.databricks.pyspark.dataFrameChunk.enabled to true....

Last updated: January 30th, 2025 by Raghavan Vaidhyaraman

INVALID_PARAMETER_VALUE error when trying to access a table or view with fine-grained access control

Upgrade the cluster's Databricks Runtime version to a version newer than 15.4 and use a single user access mode cluster....

Last updated: January 30th, 2025 by raphael.balogo

Other users can see root (main) folder despite not having access permissions

Permanently delete items in the Trash and create separate folders for shared files or objects. ...

Last updated: February 12th, 2025 by monica.cao

Execution error when trying to mount a storage account

Unmount the nested path and mount each storage account separately at distinct mount points....

Last updated: March 18th, 2025 by guruprasad.bn

Resolve Spark directory structure conflicts

Fixing java.lang.AssertionError: assertion failed: Conflicting directory structures detected...

Last updated: April 26th, 2025 by jayant.sharma

Column values assigning in the order they are passed into Row() as arguments, not to the column name indicated

Create the DataFrame from a list of dictionaries or use the row.toDict() method....

Last updated: April 28th, 2025 by Raghavan Vaidhyaraman

Insufficient privileges error when querying views in a Unity Catalog metastore

Conduct a traceback of the permission tree for the view table and grant access where needed....

Last updated: April 30th, 2025 by zhengxian.huang

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.


© Databricks 2022-2025. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Notice (Updated) | Terms of Use | Your Privacy Choices | Your California Privacy Rights Privacy Rights icon


Knowledge Base Software powered by Helpjuice

Definition by Author

0
0