Problem
When saving a table as a CSV file or other text-based format, your empty string values are replaced with NULL
values.
Cause
Empty string values are interpreted as NULL
during the Serialization/Deserialization (SerDe) step when saving tables as CSV files or other text-based data formats which do not have a defined schema present.
NULL
values may also appear when performing joins on tables if the join condition is not met.
Solution
Save your data in Delta format instead of CSV or text-based formats. Delta tables handle empty strings and NULL
values more effectively, ensuring that empty strings are preserved during data insertion.
If you need to use CSV format, ensure:
- The options being used to both read and write can correctly handle Empty /
NULL
values. - Any external systems reading the file can properly serialize the data as intended.
For more information, please review the Apache Spark CSV Files documentation.