Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Data management (AWS)

These articles can help you with Datasets, DataFrames, and other ways to structure data using Apache Spark and Databricks.

24 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • All articles
  • Data management (AWS)

Append to a DataFrame

Learn how to append to a DataFrame in Databricks....

Last updated: March 4th, 2022 by Adam Pavlacka

How to improve performance with bucketing

Learn how to improve Databricks performance by using bucketing....

Last updated: March 4th, 2022 by Adam Pavlacka

How to handle blob data contained in an XML file

Learn how to handle blob data contained in an XML file....

Last updated: March 4th, 2022 by Adam Pavlacka

Simplify chained transformations

Learn how to simplify chained transformations on your DataFrame in Databricks....

Last updated: May 25th, 2022 by Adam Pavlacka

How to dump tables in CSV, JSON, XML, text, or HTML format

Learn how to output tables from Databricks in CSV, JSON, XML, text, or HTML format....

Last updated: May 25th, 2022 by Adam Pavlacka

Get and set Apache Spark configuration properties in a notebook

...

Last updated: May 26th, 2022 by mathan.pillai

Hive UDFs

Learn how to create and use a Hive UDF for Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Prevent duplicated columns when joining two DataFrames

Learn how to prevent duplicated columns when joining two DataFrames in Databricks....

Last updated: October 13th, 2022 by Adam Pavlacka

Revoke all user privileges

Use a regex and a series of for loops to revoke all privileges for a single user....

Last updated: May 31st, 2022 by pavan.kumarchalamcharla

How to list and delete files faster in Databricks

Learn how to list and delete files faster in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

How to handle corrupted Parquet files with different schema

Learn how to read Parquet files with a specific schema using Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

No USAGE permission on database

User does not have USAGE permission on the database....

Last updated: May 31st, 2022 by rakesh.parija

Nulls and empty strings in a partitioned column save as nulls

Learn why nulls and empty strings in a partitioned column save as nulls in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Behavior of the randomSplit method

Learn about inconsistent behaviors when using the randomSplit method in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Job fails when using Spark-Avro to write decimal values to AWS Redshift

Learn how to resolve job failures when writing decimal values to AWS Redshift with Spark-Avro....

Last updated: May 31st, 2022 by Adam Pavlacka

Generate schema from case class

Learn how to generate a schema from a Scala case class....

Last updated: May 31st, 2022 by Adam Pavlacka

How to specify skew hints in dataset and DataFrame-based join commands

Learn how to specify skew hints in Dataset and DataFrame-based join commands in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

How to update nested columns

Learn how to update nested columns in Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Incompatible schema in some files

Learn how to resolve incompatible schema in Parquet files with Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Access denied when writing to an S3 bucket using RDD

Learn how to resolve an access denied error when writing to an S3 bucket using RDD....

Last updated: May 31st, 2022 by Adam Pavlacka

Invalid timestamp when loading data into Amazon Redshift

Learn how to resolve an invalid timestamp error when loading data into AWS Redshift....

Last updated: May 31st, 2022 by Adam Pavlacka

Unable to infer schema for ORC error

Apache Spark returns an error for ORC files if no schema is defined when reading from an empty directory or a base path with multiple subfolders....

Last updated: December 1st, 2022 by chandana.koppal

Object ownership is getting changed on dropping and recreating tables

Use TRUNCATE or REPLACE for tables and ALTER VIEW for views instead of dropping and recreating them....

Last updated: December 15th, 2022 by akash.bhat

User does not have permission SELECT on ANY File

Regular users cannot create tables without permission when access control is enabled....

Last updated: December 21st, 2022 by sivaprasad.cs


© Databricks 2022-2023. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Notice (Updated) | Terms of Use | Your Privacy Choices | Your California Privacy Rights Privacy Rights icon

Definition by Author

0
0