Problem
When using the Simba Spark ODBC Driver to access Databricks tables containing special characters (such as Swedish characters Å, Ä, and Ö) from external applications like SAS, the special characters do not display correctly in the external application. Instead, special characters appear as junk characters.
Cause
The encoding used in Databricks, UTF-8, differs from the encoding used in the external application, ISO 8859-15/latin9. The Simba Spark ODBC Driver does not automatically transcode the characters from UTF-8 to ISO 8859-15/latin9, resulting in junk characters appearing in the external application.
Solution
Set the environment variable SIMBA_APP_ANSI_ENCODING
to ISO-8859-15
before starting the Simba Spark ODBC Driver. This variable instructs the driver to transcode the characters from UTF-8 to ISO 8859-15/latin9, ensuring that the special characters display correctly in the external application.
Follow these steps to set the environment variable and configure the Simba Spark ODBC Driver.
1. Open the ODBC Data Source Administrator (64-bit) on the machine where the Simba Spark ODBC Driver is installed.
2. Navigate to the System DSN tab and select the Databricks DSN.
3. Click Configure to open the Simba Spark ODBC Driver configuration window.
4. Go to the Advanced options tab.
5. In the Environment variables section, add a new variable with the name SIMBA_APP_ANSI_ENCODING
and set its value to ISO-8859-15
.
6. Click OK to save the changes and close the configuration window.
7. Test the connection to confirm that the special characters now display correctly in the external application.