Load special characters with Spark-XML

Special characters are not rendering correctly. Use charset with Spark-XML.

Written by annapurna.hiriyur

Last published at: May 19th, 2022


You have special characters in your source files and are using the OSS library Spark-XML.

The special characters do not render correctly.

For example, “CLU®” is rendered as “CLU�”.


Spark-XML supports the UTF-8 character set by default. You are using a different character set in your XML files.


You must specify the character set you are using in your XML files when reading the data.

Use the charset option to define the character set when reading an XML file with Spark-XML.

For example, if your source file is using ISO-8859-1:


dfResult = spark.read.format('xml').schema(customSchema) \
.options(rowTag='Entity') \

Review the Spark-XML README file for more information on supported options.

Was this article helpful?