Unable to read/write feather file using Unity Catalog (UC) cluster

Install the PyArrow library, use read_feather() to read the file, and enable Arrow-based columnar data transfers.

Last published at: April 8th, 2025

Problem

When attempting to read or write PyArrow-created feather files from an S3 bucket using a Unity Catalog (UC) cluster, the operation fails with the following error.

FileNotFoundError: [Errrno 2] No such file or directory: ‘file_path’

The failure persists when using spark.read.csv and spark.read.format("arrow") methods. You notice the operation works in a non-UC cluster.

Cause

The spark.read.csv is designed to read CSV files, and the spark.read.format("arrow") method for reading Arrow files. Neither method is compatible with feather files.

Additionally, your UC cluster configuration may lack the necessary PyArrow library required for reading feather files.

Solution

Ensure that the PyArrow library is installed in your Databricks Runtime cluster.
Use the pandas library's read_feather() function to read the feather file. The following code snippet provides an example.

import pandas as pd

# Assuming the feather file is at '</dbfs/path/to/your/file.feather>'
df = pd.read_feather('</dbfs/path/to/your/file.feather>')

# Convert the pandas DataFrame to a Spark DataFrame
spark_df = spark.createDataFrame(df)

Enable Arrow-based columnar data transfers. In your cluster settings, under Advanced options > Spark tab, enter the following configuration in the Spark config field.

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

For more information, refer to the Convert between PySpark and pandas DataFrames documentation.

Databricks Help Center

Problem

Cause

Solution

Contact Us