Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Data sources (Azure)

These articles can help you manage your data source integrations.

10 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • Azure
  • Data sources (Azure)

Create tables on JSON datasets

In this article we cover how to create a table on JSON datasets using SerDe. Download the JSON SerDe JAR Open the hive-json-serde 1.3.8 download page. Click on json-serde-1.3.8-jar-with-dependencies.jar to download the file json-serde-1.3.8-jar-with-dependencies.jar. Info You can review the Hive-JSON-Serde GitHub repo for more information on the JAR...

Last updated: May 31st, 2022 by ram.sankarasubramanian

Failure when mounting or accessing Azure Blob storage

Problem When you try to access an already created mount point or create a new mount point, it fails with the error: WASB: Fails with java.lang.NullPointerException Cause This error can occur when the root mount path (such as /mnt/) is also mounted to blob storage. Run the following command to check if the root path is also mounted: %python dbutils.f...

Last updated: May 31st, 2022 by Adam Pavlacka

Unable to read files and list directories in a WASB filesystem

Problem When you try reading a file on WASB with Spark, you get the following exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 19, 10.139.64.5, executor 0): shaded.databricks.org.apache.hadoop.fs.azure.AzureException: com.microsoft.a...

Last updated: June 1st, 2022 by Adam Pavlacka

Optimize read performance from JDBC data sources

Problem Reading data from an external JDBC database is slow. How can I improve read performance? Solution See the detailed discussion in the Databricks documentation on how to optimize performance when reading data (AWS | Azure | GCP) from an external JDBC database....

Last updated: June 1st, 2022 by Adam Pavlacka

Troubleshooting JDBC/ODBC access to Azure Data Lake Storage Gen2

Problem Info In general, you should use Databricks Runtime 5.2 and above, which include a built-in Azure Blob File System (ABFS) driver, when you want to access Azure Data Lake Storage Gen2 (ADLS Gen2). This article applies to users who are accessing ADLS Gen2 storage using JDBC/ODBC instead. When you run a SQL query from a JDBC or ODBC client to ac...

Last updated: June 1st, 2022 by Adam Pavlacka

CosmosDB-Spark connector library conflict

This article explains how to resolve an issue running applications that use the CosmosDB-Spark connector in the Databricks environment. Problem Normally if you add a Maven dependency to your Spark cluster, your app should be able to use the required connector libraries. But currently, if you simply specify the CosmosDB-Spark connector’s Maven co-ord...

Last updated: June 1st, 2022 by Adam Pavlacka

Failure to detect encoding in JSON

Problem Spark job fails with an exception containing the message: Invalid UTF-32 character 0x1414141(above 10ffff)  at char #1, byte #7) At org.apache.spark.sql.catalyst.json.JacksonParser.parse Cause The JSON data source reader is able to automatically detect encoding of input JSON files using BOM at the beginning of the files. However, BOM is not ...

Last updated: June 1st, 2022 by Adam Pavlacka

Inconsistent timestamp results with JDBC applications

Problem When using JDBC applications with Databricks clusters you see inconsistent java.sql.Timestamp results when switching between standard time and daylight saving time. Cause Databricks clusters use UTC by default. java.sql.Timestamp uses the JVM’s local time zone. If a Databricks cluster returns 2021-07-12 21:43:08 as a string, the JVM parses i...

Last updated: June 1st, 2022 by manjunath.swamy

Kafka client terminated with OffsetOutOfRangeException

Problem You have an Apache Spark application that is trying to fetch messages from an Apache Kafka source when it is terminated with a kafkashaded.org.apache.kafka.clients.consumer.OffsetOutOfRangeException error message. Cause Your Spark application is trying to fetch expired data offsets from Kafka. We generally see this in these two scenarios: Sc...

Last updated: June 1st, 2022 by vikas.yadav

ABFS client hangs if incorrect client ID or wrong path used

Problem You are using Azure Data Lake Storage (ADLS) Gen2. When you try to access an Azure Blob File System (ABFS) path from a Databricks cluster, the command hangs. Enable the debug log and you can see the following stack trace in the driver logs: Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: https://login.microso...

Last updated: June 1st, 2022 by Adam Pavlacka


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0