Empty string values convert to NULL values when saving a table as CSV or text-based file format

Use Delta as the target format for CSV files or other text-based data formats.

Written by caio.cominato

Last published at: September 12th, 2024

Problem 

When saving a table as a CSV file or other text-based format, your empty string values are replaced with NULL values. 

Cause

Empty string values are interpreted as NULL during the Serialization/Deserialization (SerDe) step when saving tables as CSV files or other text-based data formats which do not have a defined schema present. 

NULL values may also appear when performing joins on tables if the join condition is not met. 

Solution

Save your data in Delta format instead of CSV or text-based formats. Delta tables handle empty strings and NULL values more effectively, ensuring that empty strings are preserved during data insertion.

If you need to use CSV format, ensure: 

  • The options being used to both read and write can correctly handle Empty / NULL values. 
  • Any external systems reading the file can properly serialize the data as intended.

For more information, please review the Apache Spark CSV Files documentation.