Lakehouse Federation query taking longer and using more resources than expected

Enable ANSI mode.

Written by raul.goncalves

Last published at: August 27th, 2025

Problem

You try to run the following Lakehouse Federation query on an all-purpose compute interactive cluster. 

SELECT * FROM table WHERE year(timestamp) = 2024 LIMIT 10

 

You notice the query takes longer than expected. When you check the metrics and cluster usage, you also notice more resources allocated.

 

When you execute the query from a SQL warehouse, the result returns within seconds.

 

Cause

ANSI mode is off by default for all-purpose computes. When ANSI mode is off, certain operations are prohibited, such as casting a timestamp field to a DATE field.

 

Further, in the case of the specific query, the year() function can only operate on DATE fields, so Apache Spark adds an explicit conversion of the timestamp column to DATE. When ANSI mode is off, this cast is not allowed, causing the pushdown query to fail. As a result, the filter and limit cannot be pushed down to the external source, leading to slow query performance.

 

By contrast, ANSI mode is on by default in SQL warehouses, allowing the query to run quickly.

 

Solution

Enable ANSI mode on all-purpose compute by running the following command in a notebook.

%sql
SET spark.sql.ansi.enabled = true;

 

Or set the following Spark config at the cluster level to allow casting and pushdown of the query.

spark.sql.ansi.enabled true

 

For details on how to apply Spark configs, refer to the “Spark configuration” section of the Compute configuration reference (AWS | Azure | GCP) documentation.

 

For more information on ANSI mode, refer to the ANSI compliance in Databricks Runtime (AWSAzureGCP) documentation.