Cannot read audit logs due to duplicate columns

Case-sensitive parameter names cause a duplicate column error when reading audit logs.

Written by Adam Pavlacka

Last published at: July 22nd, 2022


You are trying to read audit logs and get an AnalysisException: Found duplicate column(s) error."json").load("dbfs://mnt/logs/<path-to-logs>/date=2021-12-07")
//  AnalysisException: Found duplicate column(s) in the data schema: `<some_column>`


From November 2021 to December 2021, a limited number of Databricks SQL audit logs were published with duplicate case-sensitive parameter names. This can break the schema inference and generate an error when you try to read audit logs generated during this time.

The following parameter names were duplicated:

| Correct name | Duplicate name |
| ------------ | -------------- |
| dataSourceId | DataSourceId   |
| alertId      | AlertId        |
| dashboardId  | DashboardId    |


Create a new cluster to read and repair the audit log files.

Turn case sensitivity on so that schema inference can properly load the data.


spark.conf.set("spark.sql.caseSensitive", true)"json").load("dbfs://mnt/logs/pathToMyLogs/date=2021-12-07").write...
spark.conf.set("spark.sql.caseSensitive", false)

After the logs have been processed, turn case sensitivity off.



Enabling schema inference on shared clusters and/or clusters that perform other workloads could cause issues with other workloads.

Was this article helpful?