Data sources - Databricks

Create tables on JSON datasets

Create tables on JSON datasets; requires SerDe JAR....

Last updated: May 31st, 2022 by ram.sankarasubramanian

Delete table when underlying S3 bucket is deleted

Do not delete the contents of a S3 bucket before dropping a table that stores data in the bucket....

Last updated: May 31st, 2022 by Jose Gonzalez

Failure when mounting or accessing Azure Blob storage

Learn how to resolve a failure when mounting or accessing Azure Blob storage from Databricks....

Last updated: May 31st, 2022 by Adam Pavlacka

Unable to read files and list directories in a WASB filesystem

Learn how to interpret errors that occur when accessing WASB append blob types in Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Optimize read performance from JDBC data sources

Learn how to optimize performance when reading from JDBC data sources in Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Troubleshooting JDBC/ODBC access to Azure Data Lake Storage Gen2

Learn how to troubleshoot JDBC and ODBC access to Azure Data Lake Storage Gen2 from Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

CosmosDB-Spark connector library conflict

Learn how to resolve conflicts that arise when using the CosmosDB-Spark connector library with Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Failure to detect encoding in JSON

Learn how to resolve a failure to detect encoding of input JSON files when using BOM with Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Inconsistent timestamp results with JDBC applications

Timestamp records are inconsistent with JDBC applications when daylight saving time adjustments are made....

Last updated: June 1st, 2022 by manjunath.swamy

Kafka client terminated with OffsetOutOfRangeException

Kafka client is terminated with `OffsetOutOfRangeException` when trying to fetch messages...

Last updated: June 1st, 2022 by vikas.yadav

Apache Spark JDBC datasource query option doesn’t work for Oracle database

Learn how to resolve an error that occurs when using the Apache Spark JDBC datasource to connect to Oracle Database from Databricks....

Last updated: June 1st, 2022 by Adam Pavlacka

Accessing Redshift fails with NullPointerException

Learn how to resolve a `NullPointerException` error that occurs when you read a Redshift table....

Last updated: June 1st, 2022 by Adam Pavlacka

Redshift JDBC driver conflict issue

Learn how to resolve a Redshift JDBC SQLDriverWrapper driver conflict....

Last updated: June 1st, 2022 by Adam Pavlacka

ABFS client hangs if incorrect client ID or wrong path used

Trying to access an Azure Blob File System (ABFS) path results in a hung command when using Azure Data Lake Storage Gen2 (ADLS)....

Last updated: June 1st, 2022 by Adam Pavlacka

Reading a table fails due to AAD token timeout on ADLS Gen2

Accessing ADLS Gen2 storage fails if the AAD service principal token is expired or invalid....

Last updated: November 30th, 2022 by John.Lourdu

Recursive references in Avro schema are not allowed

Apache Avro data sources cannot have recursive references in the schema when used with Spark....

Last updated: February 19th, 2025 by saikrishna.pujari

Error when reading data from ADLS Gen1 with Sparklyr

Learn how to resolve errors that occur when reading data from Azure Data Lake Storage Gen1 with Sparklyr in Databricks....

Last updated: December 9th, 2022 by Adam Pavlacka

Long jobs fail when accessing ADLS

Long running jobs that use Azure AD credential passthrough to access ADLS fail after 1 hour....

Last updated: December 9th, 2022 by huaming.liu

ADLS and WASB writes are being throttled

Learn how to resolve a "files and folders are being created at too high a rate" ADLS or WASB storage error....

Last updated: December 9th, 2022 by Adam Pavlacka

Unable to access Azure Data Lake Storage (ADLS) Gen1 when firewall is enabled

Learn how to troubleshoot access issues when connecting to Azure Data Lake Storage Gen 1 from Databricks with a firewall enabled....

Last updated: December 9th, 2022 by Adam Pavlacka

SQL access control error when using Snowflake as a data source

Snowflake does not officially support schema as an option; you must use sfschema....

Last updated: January 20th, 2023 by John.Lourdu

Apache Spark reading .gzip files from S3 instead of decompressed data

Rename the files in S3 from .gzip to .gz....

Last updated: September 12th, 2024 by kuldeep.mishra

Column drift when reading multiple delimited files

Ensure that all files being processed together have the same schema....

Last updated: September 23rd, 2024 by lakshay.goel

NullPointerException when reading shapefiles from cloud storage on a Mosaic and GDAL enabled cluster

Zip the entire shapefile and upload to a Unity Catalog volume or DBFS storage. ...

Last updated: September 12th, 2024 by jessica.santos

MULTIPLE_XML_DATA_SOURCE error while working with XML data

Remove the external XML library from the cluster. ...

Last updated: August 30th, 2024 by kaushal.vachhani

Databricks jobs using AWS Glue Data Catalog failing due to inability to reach cluster drivers

Ensure the Databricks cluster's IAM role has necessary permissions to access AWS Glue Data Catalog, update the IAM policy, and restart the cluster....

Last updated: October 15th, 2024 by raphael.balogo

ALTER TABLE (drop partition) error in Unity Catalog external tables

For CSV, JSON, ORC, or data formats, use partition metadata logging. ...

Last updated: October 15th, 2024 by lakshay.goel

“java.lang.IllegalStateException: Unexpected type: JSON” error when creating an external table from BigQuery

Upgrade your cluster to Databricks Runtime 14.0 or above. ...

Last updated: October 22nd, 2024 by jessica.santos

Schema mismatch issue while reading parquet files

Fix the file schema or read the files separately. ...

Last updated: October 23rd, 2024 by lakshay.goel

Reading a CSV file in DROPMALFORMED still includes malformed rows in the result

...

Last updated: November 7th, 2024 by shubham.bhusate

Cannot see ingested data loaded from an external ORC table

Use the same Hive interface to ingest and read your Delta table. ...

Last updated: November 17th, 2024 by lakshay.goel

Security Bulletin: Databricks JDBC Driver Vulnerability Advisory - [CVE-2024-49194]

Restart any long running clusters and update your JDBC driver to the latest version....

Last updated: December 11th, 2024 by Adam Pavlacka

Error when creating a Delta table using the UI and external data in Delta format

Create the table using a notebook instead. ...

Last updated: December 13th, 2024 by manikandan.ganesan

Error when trying to access Azure storage account from China region

Rule out common Apache Spark configuration issues and ensure your Spark configuration for your OAuth endpoint is set to the China region....

Last updated: January 22nd, 2025 by saikumar.divvela

KeyProviderException error when trying to create an external table on an external schema with authentication at the notebook level

Set up authorization at the cluster configuration level instead....

Last updated: January 31st, 2025 by Ernesto Calderón

Multiple identical files being written to badRecordsPath instead of just one file when writing code to read a CSV file as a DataFrame

Use .option(“mode”, “PERMISSIVE”) instead....

Last updated: March 27th, 2025 by Vidhi Khaitan

Databricks Help Center

Create tables on JSON datasets

Delete table when underlying S3 bucket is deleted

Failure when mounting or accessing Azure Blob storage

Unable to read files and list directories in a WASB filesystem

Optimize read performance from JDBC data sources

Troubleshooting JDBC/ODBC access to Azure Data Lake Storage Gen2

CosmosDB-Spark connector library conflict

Failure to detect encoding in JSON

Inconsistent timestamp results with JDBC applications

Kafka client terminated with OffsetOutOfRangeException

Apache Spark JDBC datasource query option doesn’t work for Oracle database

Accessing Redshift fails with NullPointerException

Redshift JDBC driver conflict issue

ABFS client hangs if incorrect client ID or wrong path used

Reading a table fails due to AAD token timeout on ADLS Gen2

Recursive references in Avro schema are not allowed

Error when reading data from ADLS Gen1 with Sparklyr

Long jobs fail when accessing ADLS

ADLS and WASB writes are being throttled

Unable to access Azure Data Lake Storage (ADLS) Gen1 when firewall is enabled

SQL access control error when using Snowflake as a data source

Apache Spark reading .gzip files from S3 instead of decompressed data

Column drift when reading multiple delimited files

NullPointerException when reading shapefiles from cloud storage on a Mosaic and GDAL enabled cluster

MULTIPLE_XML_DATA_SOURCE error while working with XML data

Databricks jobs using AWS Glue Data Catalog failing due to inability to reach cluster drivers

ALTER TABLE (drop partition) error in Unity Catalog external tables

“java.lang.IllegalStateException: Unexpected type: JSON” error when creating an external table from BigQuery

Schema mismatch issue while reading parquet files

Reading a CSV file in DROPMALFORMED still includes malformed rows in the result

Cannot see ingested data loaded from an external ORC table

Security Bulletin: Databricks JDBC Driver Vulnerability Advisory - [CVE-2024-49194]

Error when creating a Delta table using the UI and external data in Delta format

Error when trying to access Azure storage account from China region

KeyProviderException error when trying to create an external table on an external schema with authentication at the notebook level

Table or view not found error when trying to query a federated table using SQL serverless compute

Oracle Federation failing to find a data source

Using LIKE statement causing slower performance in Lakehouse Federation query

Total size of serialized results of tasks is larger than spark.driver.maxResultSize when using ODBC connection

Multiple identical files being written to badRecordsPath instead of just one file when writing code to read a CSV file as a DataFrame

Getting error when trying to connect to SFTP server from Databricks using passwordless authentication

CharConversionException when importing non UDF data from IBM Db2 to Databricks

Error “The security token included in the request is invalid.” when trying to access a mount point

SQL query on BigQuery table fails with ClassCastException error

Error when trying to read data from MongoDB

Oracle JDBC connection fails when using keystore and truststore wallets in a standard compute

Contact Us