Problem
While working with XML data in migration to 14.3 Databricks Runtime, you receive an error that multiple XML data sources were detected.
AnalysisException: [MULTIPLE_XML_DATA_SOURCE] Detected multiple data sources with the name xml (com.databricks.spark.xml.DefaultSource, org.apache.spark.sql.execution.datasources.xml.XmlFileFormat)
Cause
Starting with Databricks Runtime version 14.3, Databricks now natively supports XML read and write operations. If an external Spark XML library (such as spark_xml_2_12_0_12_0.jar
) is installed on the cluster, it may conflict with the built-in XML classpath.
Solution
Remove the external XML library, such as spark_xml_2_12_0_12_0.jar
, from the cluster.
For more information about the XML format in Databricks, please refer to the Read and write XML files (AWS | Azure | GCP) documentation.