Problem
When cloning a Delta table by copying its data files to a new location and recreating it using CREATE TABLE ... LOCATION
, the DESCRIBE HISTORY
output shows different timestamp values even though the data is identical. You had expected the commit history (including timestamps) to remain the same as the original table.
Cause
Updating timestamp values is expected behavior. Apache Spark recreates the transaction log files during the CREATE TABLE
execution using a create API with overwrite mode.
What happens behind the scenes
- Data files are copied from one location (X) to another (Y) using file system copy mechanisms.
- A new table is created at location Y using the
CREATE TABLE ... LOCATION
command, which points to the copied data.
What Spark does
When you run CREATE TABLE ... LOCATION
on a path that already contains Delta data:
- Spark reads the existing
_delta_log
files. - Spark writes a new commit entry in
_delta_log
to register the table in the new region’s metastore. - This new commit has a fresh timestamp, which becomes the latest entry in
DESCRIBE HISTORY
.
The following example log entry from the driver log4j
shows file creation with overwrite mode on a copied transaction log file.
INFO S3AFileSystem:V3: FS_OP_CREATE BUCKET[<bucket-name>] FILE[/test-prod/<id>/FileStore/<test-number>/_delta_log/00000000000000000000.json] Creating output stream; permission: { masked: rw-r--r--, unmasked: rw-rw-rw- }, overwrite: true, bufferSize: 65536, encryption: SSE_S3
Solution
This behavior is by design. Delta transaction history reflects when the transaction occurred, not when the data was originally written. There is no supported mechanism to preserve the original commit timestamps during table recreation. This would violate Delta’s transactional guarantees.