Problem
You have a column in a Delta table defined as CHAR(10), but the row data contains less than 10 characters. If you try to retrieve a row with less than 10 characters, no data is returned. However, if you use the upper or trim functions and try to retrieve a row, you get the correct results.
Example
This example code creates a table with CHAR(10) and inserts a record with value “ABCDEF”. When reading the same value back , It returns zero records.
%sql
create or replace table <catalog.db.table> (id DECIMAL(18,0),source CHAR(10), updated CHAR(1)) USING delta;
insert into <catalog.db.table> values(1234567890, 'ABCDEF','Y');
select * from <catalog.db.table> where source = 'ABCDEF'Cause
CHAR type fields that have less than the declared length are padded with spaces on read to complete the missing length. For example, a row in a CHAR(10) column with the value “ABCDEF” is padded as “ABCDEF “ (six letters and four spaces for a total of 10 characters) when a read is performed.
Info
The CHAR pad-on-read behavior change was introduced in Serverless 2024.15. For more information, review the Serverless 2024.15 release notes (AWS | Azure | GCP).
Solution
As a best practice you should pad your reads with spaces, so they match the declared length of the CHAR field.
If you do not want to pad your reads, you can set spark.conf.set("spark.sql.legacy.charVarcharAsString", "true") at the beginning of your notebook. This configures a CHAR/VARCHAR field to behave as a string, so you can query just the values without padded spaces.
Example
%python
spark.conf.set("spark.sql.legacy.charVarcharAsString", "true")
%sql
select * from <catalog.db.table> where source = 'ABCDEF';