Problem
You have special characters in your source files and are using the OSS library Spark-XML.
The special characters do not render correctly.
For example, “CLU®” is rendered as “CLU�”.
Cause
Spark-XML supports the UTF-8 character set by default. You are using a different character set in your XML files.
Solution
You must specify the character set you are using in your XML files when reading the data.
Use the charset option to define the character set when reading an XML file with Spark-XML.
For example, if your source file is using ISO-8859-1:
%python dfResult = spark.read.format('xml').schema(customSchema) \ .options(rowTag='Entity') \ .options(charset='ISO-8859-1')\ .load('/<path-to-xml>/<sample-file>.xml')
Review the Spark-XML README file for more information on supported options.