Problem
When you try to programmatically save Excel files to a Unity Catalog location using Databricks Runtime 15.4 LTS, the save fails with the following error message.
org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error class 'RETRIES_EXCEEDED' SQLSTATE: XX000
When you check your driver logs, you see Apache Spark Connect is throwing an error during the execution phase.
25/02/19 12:10:33 INFO QueryProfileListener: Query profile sent to logger, seq number: 131, app id: <app-id>
25/02/19 12:10:33 INFO ErrorUtils: Spark Connect error during: execute. UserId: <user-id>. SessionId: <session-id>.
java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.putArchiveEntry(Lorg/apache/commons/compress/archivers/zip/ZipArchiveEntry;)V
at org.apache.poi.openxml4j.opc.internal.ZipContentTypeManager.saveImpl(ZipContentTypeManager.java:65)
at org.apache.poi.openxml4j.opc.internal.ContentTypeManager.save(ContentTypeManager.java:450)
Cause
When you programmatically save a file to a Unity Catalog location, you may involve a third-party library, com.crealytics:spark-excel_2.12:3.5.1_0.20.4
.
This library is not tested in Spark 3.5, which is the Spark version used in Databricks Runtime 15.4 LTS. The library instead requires one of the following Spark versions:
- 2.4.1
- 2.4.7
- 2.4.8
- 3.0.1
- 3.0.3
- 3.1.1
- 3.1.2
- 3.2.4
- 3.3.2
- 3.4.1
Solution
To continue using this third party library, use a Databricks Runtime version with a compatible Spark version where the library is tested, for example Databricks Runtime 13.3 LTS.
For more information on possible Databricks Runtime versions, refer to the Databricks Runtime release notes versions and compatibility (AWS | Azure | GCP).
For more information about the library and Spark compatibility, refer to the Spark-excel Github repo.