Problem
You have a column in a Delta table defined as CHAR(10)
, but the row data contains less than 10 characters. If you try to retrieve a row with less than 10 characters, no data is returned. However, if you use the upper
or trim
functions and try to retrieve a row, you get the correct results.
Example
This example code creates a table with CHAR(10)
and inserts a record with value “ABCDEF”. When reading the same value back , It returns zero records.
%sql
create or replace table <catalog.db.table> (id DECIMAL(18,0),source CHAR(10), updated CHAR(1)) USING delta;
insert into <catalog.db.table> values(1234567890, 'ABCDEF','Y');
select * from <catalog.db.table> where source = 'ABCDEF'
Cause
CHAR
type fields that have less than the declared length are padded with spaces on read to complete the missing length. For example, a row in a CHAR(10)
column with the value “ABCDEF” is padded as “ABCDEF “ (six letters and four spaces for a total of 10 characters) when a read is performed.
Info
The CHAR
pad-on-read behavior change was introduced in Serverless 2024.15. For more information, review the Serverless 2024.15 release notes (AWS | Azure | GCP).
Solution
As a best practice you should pad your reads with spaces, so they match the declared length of the CHAR
field.
If you do not want to pad your reads, you can set spark.conf.set("spark.sql.legacy.charVarcharAsString", "true")
at the beginning of your notebook. This configures a CHAR
/VARCHAR
field to behave as a string, so you can query just the values without padded spaces.
Example
%python
spark.conf.set("spark.sql.legacy.charVarcharAsString", "true")
%sql
select * from <catalog.db.table> where source = 'ABCDEF';