尝试序列化包含数组的 Avro GenericRecord 时出现 NullPointerException

NullPointerException when attempting to serialize Avro GenericRecord containing array

我正在尝试发布 Avro(到 Kafka)并在尝试使用 BinaryEncoder.

编写 Avro 对象时获得 NullPointerException

这里是简化的堆栈跟踪:

java.lang.NullPointerException: null of array of com.mycode.DeeplyNestedObject of array of com.mycode.NestedObject of union of com.mycode.ParentObject
    at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60) ~[avro-1.8.1.jar:1.8.1]
    at com.mycode.KafkaAvroPublisher.send(KafkaAvroPublisher.java:61) ~[classes/:na]
    ....
    at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:112) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:87) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143) ~[avro-1.8.1.jar:1.8.1]
    at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105) ~[avro-1.8.1.jar:1.8.1]
    ... 55 common frames omitted

这是我代码中发生异常的发送方法:

private static final EncoderFactory ENCODER_FACTORY = EncoderFactory.get();
private static final SpecificDatumWriter<ParentObject> PARENT_OBJECT_WRITER = new SpecificDatumWriter<>(ParentObject.SCHEMA$);
public void send(ParentObject parentObject) {
    try {
        ByteArrayOutputStream stream = new ByteArrayOutputStream();
        binaryEncoder = ENCODER_FACTORY.binaryEncoder(stream, binaryEncoder);
        PARENT_OBJECT_WRITER.write(parentObject, binaryEncoder);  // Exception HERE
        binaryEncoder.flush();
        producer.send(new ProducerRecord<>(topic, stream.toByteArray()));
    } catch (IOException ioe) {
        logger.debug("Problem publishing message to Kafka.", ioe);
    }
}

在架构中,NestedObject 包含一个 DeeplyNestedObject 数组。我已经进行了足够多的调试,发现 NestedObject 实际上包含一个 DeeplyNestedObject 数组,如果存在 none,则包含一个空数组。这是架构的相关部分:

[ { "namespace": "com.mycode.avro"
  , "type": "record"
  , "name": "NestedObject"
  , "fields":
    [ { "name": "timestamp", "type": "long", "doc": "Instant in time (milliseconds since epoch)." }
    , { "name": "objs", "type": { "type": "array", "items": "DeeplyNestedObject" }, "doc": "Elided." }
    ]
  }
]

我对您拥有的对象了解不够,但我在您的示例中看到的是您的 avro-schema 不正确。

avro中的DeeplyNestedObject是一个Record,所以你的schema必须是这样的:

{
  "type": "record",
  "name": "NestedObject",
  "namespace": "com.mycode.avro",
  "fields": [
    {
      "name": "timestamp",
      "type": "long"
    },
    {
      "name": "objs",
      "type": {
        "type": "record",
        "name": "DeeplyNestedObject",
        "fields": []
      }
    }
  ]
}

当然DeeplyNestedObject的所有字段都需要在"fields"中声明:[]与DeeplyNestedObject记录相关

Avro 的堆栈跟踪具有误导性。该问题可能比 Exception 消息指示的 class 更深一层。

当它说“null of array of com.mycode.DeeplyNestedObject of array of com.mycode.NestedObject of union of com.mycode.ParentObject”时,这意味着 DeeplyNestedObject 中的一个字段应该是 array 但被发现是 null . (将 DeeplyNestedObject 误解为 NestedObject 内部的 null 是完全有道理的。)

您需要检查 DeeplyNestedObject 的字段并找出哪个 array 没有被正确序列化。问题很可能出在创建 DeeplyNestedObject 的地方。它将有一个类型为 array 的字段,在调用发送方法之前,序列化程序不会在所有情况下填充该字段。