Problem
You ingest data into a Delta table from an Apache ORC table (outside Databricks), but cannot see the data when querying the partition columns in the Delta table.
Cause
From Apache Spark 2.4 onwards, the Hive interface configuration spark.sql.hive.convertMetastoreOrc
default is set to true
. In older Spark versions, the configuration is set to false
.
This creates a situation where the Hive interface used at the time of ingesting the Delta table is different than the Hive interface used while reading the same Delta table. This leads to incorrect results while querying the Delta table.
Solution
Ensure you use the same Hive interface while ingesting and reading the Delta table.
If you ingest data using Spark versions older than 2.4, then also set the following configuration at the cluster level.
* `spark.sql.hive.convertMetastoreOrc=false`