Problem
Your Databricks workflow fails due to an internal Apache Spark assertion error.
Example code
This example code results in an assertion error.
%python
# Save the dataset to the table path (non-partitioned)
df.write.mode("overwrite").parquet("dbfs:/FileStore/Jayant/tableDir")
# Save the dataset again to a subdirectory path using `partitionBy` (partitioned)
df.write.mode("overwrite").partitionBy("year", "month").parquet("dbfs:/FileStore/Jayant/tableDir/2025/04")
# Reading this table using Spark results in an assertion error
spark.read.parquet("dbfs:/FileStore/Jayant/tableDir").display()
Error message
java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
dbfs:/filestore/jayant/tabledir
dbfs:/filestore/jayant/tabledir/2025/04
If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:316)
at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:155)
at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:205)
at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:110)
at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:205)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:494)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:394)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:350)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:350)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:871)
Cause
This issue occurs when reading a Spark table which has been created with spark partitioning but with ambiguous sub-directories under a table directory.
The error is caused by a failure in Spark’s partition discovery logic while attempting to infer the schema of a directory structure with inconsistent layouts.
When you read data using Spark, it attempts to automatically infer partitioning by parsing the input paths. This is handled internally by org.apache.spark.sql.execution.datasources.PartitioningUtils.parsePartitions
. For more information, review the source code.
In the reported error, Spark detects two distinct paths:
dbfs:/filestore/jayant/tabledir
dbfs:/filestore/jayant/tabledir/2025/04
These paths represent conflicting directory structures: one appears unpartitioned, and the other resembles a partitioned layout based on path depth. But since the folder names (2025
, 04
) are not in Hive style key=value
format, Spark cannot map them to valid partition column names.
This ambiguity leads Spark to fail an internal assertion.
Solution
When reading partitioned data without partition column names in the path, set basePath
to the common root to correctly infer partitioning.
Example code
This example code uses basePath
so Spark can correctly read both partitions and return an output.
%python
spark.read.option("basePath", "dbfs:/FileStore/Jayant/tableDir").parquet("dbfs:/FileStore/Jayant/tableDir/2025/04").display()
Preventive measures
- Avoid mixing non-partitioned files and partitioned subdirectories under the same path when working with inference. Ensure that your table and partition directories adhere to a consistent format without extraneous directories.
- If you want to create a table with partitions, always use Spark partitioning with
partitionBy()
and keep the write path to be the root table directory.