Cannot see ingested data loaded from an external ORC table

Use the same Hive interface to ingest and read your Delta table.

Written by lakshay.goel

Last published at: November 17th, 2024

Problem

You ingest data into a Delta table from an Apache ORC table (outside Databricks), but cannot see the data when querying the partition columns in the Delta table.

 

Cause

From Apache Spark 2.4 onwards, the Hive interface configuration  spark.sql.hive.convertMetastoreOrc default is set to true. In older Spark versions, the configuration is set to false

This creates a situation where the Hive interface used at the time of ingesting the Delta table is different than the Hive interface used while reading the same Delta table. This leads to incorrect results while querying the Delta table.

 

Solution

Ensure you use the same Hive interface while ingesting and reading the Delta table. 

If you ingest data using Spark versions older than 2.4, then also set the following configuration at the cluster level. 

 

* `spark.sql.hive.convertMetastoreOrc=false`