You have special characters in your source files and are using the OSS library Spark-XML.
The special characters do not render correctly.
For example, “CLU®” is rendered as “CLU�”.
Spark-XML supports the UTF-8 character set by default. You are using a different character set in your XML files.
You must specify the character set you are using in your XML files when reading the data.
charset option to define the character set when reading an XML file with Spark-XML.
For example, if your source file is using ISO-8859-1:
dfResult = spark.read.format('xml').schema(customSchema) \ .options(rowTag='Entity') \ .options(charset='ISO-8859-1')\ .load('/<path-to-xml>/<sample-file>.xml')
Review the Spark-XML README file for more information on supported options.