MULTIPLE_XML_DATA_SOURCE error while working with XML data

Remove the external XML library from the cluster.

Written by kaushal.vachhani

Last published at: August 30th, 2024

Problem 

While working with XML data in migration to 14.3 Databricks Runtime, you receive an error that multiple XML data sources were detected. 

AnalysisException: [MULTIPLE_XML_DATA_SOURCE] Detected multiple data sources with the name xml (com.databricks.spark.xml.DefaultSource, org.apache.spark.sql.execution.datasources.xml.XmlFileFormat)

Cause

Starting with Databricks Runtime version 14.3, Databricks now natively supports XML read and write operations. If an external Spark XML library (such as spark_xml_2_12_0_12_0.jar) is installed on the cluster, it may conflict with the built-in XML classpath. 

Solution

Remove the external XML library, such as spark_xml_2_12_0_12_0.jar, from the cluster. 

For more information about the XML format in Databricks, please refer to the Read and write XML files (AWSAzureGCP) documentation.