Behavioral changes for the CHAR data type on Serverless

Pad your reads with spaces to match the declared length of the CHAR field or set the legacy charVarcharAsString config to true.

Written by shanmugavel.chandrakasu

Last published at: October 18th, 2024

Problem

You have a column in a Delta table defined as CHAR(10), but the row data contains less than 10 characters. If you try to retrieve a row with less than 10 characters, no data is returned. However, if you use the upper or trim functions and try to retrieve a row, you get the correct results.

Example

This example code creates a table with CHAR(10) and inserts a record with value “ABCDEF”. When reading the same value back , It returns zero records.

%sql

create or replace table <catalog.db.table> (id DECIMAL(18,0),source CHAR(10), updated CHAR(1)) USING delta;
insert into <catalog.db.table> values(1234567890, 'ABCDEF','Y');
select * from <catalog.db.table> where source = 'ABCDEF'

Cause

CHAR type fields that have less than the declared length are padded with spaces on read to complete the missing length. For example, a row in a CHAR(10) column with the value “ABCDEF” is padded as “ABCDEF    “ (six letters and four spaces for a total of 10 characters) when a read is performed.

Info

The CHAR pad-on-read behavior change was introduced in Serverless 2024.15. For more information, review the Serverless 2024.15 release notes (AWSAzureGCP).

 

Solution

As a best practice you should pad your reads with spaces, so they match the declared length of the CHAR field.

If you do not want to pad your reads, you can set spark.conf.set("spark.sql.legacy.charVarcharAsString", "true") at the beginning of your notebook. This configures a CHAR/VARCHAR field to behave as a string, so you can query just the values without padded spaces. 

Example

%python

spark.conf.set("spark.sql.legacy.charVarcharAsString", "true")

 

%sql

select * from <catalog.db.table> where source = 'ABCDEF';