Problem
You are querying a Delta table shared in Delta Sharing and you get a RESOURCE_LIMIT_EXCEEDED error.
The error may reference a timeout.
io.delta.sharing.spark.util.UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 400 Bad Request {"errorCode":"RESOURCE_LIMIT_EXCEEDED","message":"A timeout occurred when processing the table. If it continues to happen, please contact your data provider to share a smaller table instead."}.
The error may also reference table metadata that exceeds size limits.
{ 'errorCode': 'RESOURCE_LIMIT_EXCEEDED', 'message': 'The table metadata size exceeded limits'}
Cause
Delta Sharing has limits on the metadata size of a shared table.
- You are limited to 700k AddFiles actions in the DeltaLog. This is how many active files you can have in a shared Delta table.
- You are limited to 100k RemoveFiles actions in the DeltaLog. This is the number of files that have been deleted. This includes files that have been removed by operations like OPTIMIZE and MERGE.
Solution
You can run OPTIMIZE (AWS | Azure | GCP) on the shared Delta table to reduce the number of active files.
OPTIMIZE table_name [WHERE predicate] [ZORDER BY (col_name1 [, ...] ) ]
After running OPTIMIZE to reduce the number active files, you may hit the RemoveFiles limit if the OPTIMIZE command removed more than 100K files.
If this is the case, you can temporarily lower the delta.logRetentionDuration property. This lowers the length of time items stay in the DeltaLog. By setting it to a short retention time, for example 24 hours, the transaction log is quickly purged, which helps you stay under the 100K RemoveFiles limit.
%sql ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.logRetentionDuration'='24 hrs')
Once the issue is resolved, you should revert the delta.logRetentionDuration property back to 30 days, so you can continue to use the time travel feature.
%sql ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.logRetentionDuration'='30 days')
To prevent the issue from reoccurring, you should run OPTIMIZE periodically. This helps keep the number of active files below the limit.