Problem
You have a pre-existing Delta table and want to add a new column, '<new-column>'
. When you use the following ALTER TABLE command, the reordering is only adjusted at the metadata level.
ALTER TABLE <table> ALTER COLUMN <new-column> AFTER <existing-column>;
Cause
ALTER TABLE
affects the metadata level only. It does not change the physical layout of the parquet files underneath. Limiting its impact ensure metadata-only changes to the schema, such as renaming or dropping columns, do not incur costly rewrites of the underlying data files.
For more information, refer to the Update Delta Lake table schema (AWS | Azure | GCP) documentation.
Solution
To rewrite files with new column order taken into account, use either CREATE OR REPLACE TABLE
or INSERT OVERWRITE TABLE
instead of ALTER TABLE
.
CREATE OR REPLACE TABLE
CREATE OR REPLACE TABLE <table> AS
SELECT <existing-column>, <new-column>, <other-existing-column>, ... -- desired column order
FROM <table>
INSERT OVERWRITE TABLE
INSERT OVERWRITE TABLE <table>
SELECT <existing-column>, <new-column>, <other-existing-column>, ...
FROM <table>