Recursive references in Avro schema are not allowed

Apache Avro data sources cannot have recursive references in the schema when used with Spark.

Written by saikrishna.pujari

Last published at: December 1st, 2022

Problem

Apache Spark returns an error when trying to read from an Apache Avro data source if the Avro schema has a recursive reference.

org.apache.spark.sql.avro.IncompatibleSchemaException:
Found recursive reference in Avro schema, which can not be processed by Spark

Cause

Spark SQL does not support recursive references in an Avro data source because it is impossible to convert the schema to StructType.

Review the [SPARK-25718][SQL]Detect recursive reference in Avro schema and throw exception pull request for more information.

Solution

You must avoid using recursive references in your Avro schema.

Test for recursive references

You can test your Avro schema for recursive references with SchemaConverters.toSqlType(<avro-schema>).

%sql

import org.apache.spark.sql.avro.SchemaConverters
SchemaConverters.toSqlType(<avro-schema>)

If the Avro schema contains recursive references, SchemaConverters.toSqlType returns an error.

Example

  1. Create an Avro schema with a recursive reference.
    %sql
    
    import org.apache.avro.Schema
    val schema = new Schema.Parser().parse("""{
      "type": "record",
      "name": "LongList",
      "aliases": ["LinkedLongs"],                     
      "fields" : [
        {"name": "value", "type": "long"},             
        {"name": "next", "type": ["null", "LongList"]} 
      ]
    }""")
  2. Test the schema with SchemaConverters.toSqlType.
    %sql
    
    import org.apache.spark.sql.avro.SchemaConverters
    SchemaConverters.toSqlType(schema)
  3. It returns an IncompatibleSchemaExceptionerror.
    IncompatibleSchemaException: Found recursive reference in Avro schema, which can not be processed by Spark: {  "type" : "record",  "name" : "LongList",  "fields" : [ {    "name" : "value",    "type" : "long"  }, {    "name" : "next",    "type" : [ "null", "LongList" ]  } ],  "aliases" : [ "LinkedLongs" ] }