Problem
Apache Spark returns an error when trying to read from an Apache Avro data source if the Avro schema has a recursive reference.
org.apache.spark.sql.avro.IncompatibleSchemaException: Found recursive reference in Avro schema, which can not be processed by Spark
Cause
Spark SQL does not support recursive references in an Avro data source because it is impossible to convert the schema to StructType.
Review the [SPARK-25718][SQL]Detect recursive reference in Avro schema and throw exception pull request for more information.
Solution
You must avoid using recursive references in your Avro schema.
Test for recursive references
You can test your Avro schema for recursive references with SchemaConverters.toSqlType(<avro-schema>).
%sql import org.apache.spark.sql.avro.SchemaConverters SchemaConverters.toSqlType(<avro-schema>)
If the Avro schema contains recursive references, SchemaConverters.toSqlType returns an error.
Example
- Create an Avro schema with a recursive reference.
%sql import org.apache.avro.Schema val schema = new Schema.Parser().parse("""{ "type": "record", "name": "LongList", "aliases": ["LinkedLongs"], "fields" : [ {"name": "value", "type": "long"}, {"name": "next", "type": ["null", "LongList"]} ] }""")
- Test the schema with SchemaConverters.toSqlType.
%sql import org.apache.spark.sql.avro.SchemaConverters SchemaConverters.toSqlType(schema)
- It returns an IncompatibleSchemaExceptionerror.
IncompatibleSchemaException: Found recursive reference in Avro schema, which can not be processed by Spark: { "type" : "record", "name" : "LongList", "fields" : [ { "name" : "value", "type" : "long" }, { "name" : "next", "type" : [ "null", "LongList" ] } ], "aliases" : [ "LinkedLongs" ] }