用于 Json 条消息的 Apache Beam Kafka IO - org.apache.kafka.common.errors.SerializationException

Question

我正在尝试熟悉 Apache beam Kafka IO 并遇到以下错误

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.kafka.common.errors.SerializationException: Size of data received by LongDeserializer is not 8
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
    at com.andrewjones.KafkaConsumerExample.main(KafkaConsumerExample.java:58)
Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by LongDeserializer is not 8

以下是从 kafka 主题读取消息的一段代码。如果有人可以提供一些指示，我们将不胜感激。

public class KafkaConsumerJsonExample { 静态最终字符串 TOKENIZER_PATTERN = "[^\p{L}]+";

public static void main(String[] args) {
    PipelineOptions options = PipelineOptionsFactory.create();

    // Create the Pipeline object with the options we defined above.
    Pipeline p = Pipeline.create(options);

    p.apply(KafkaIO.<Long, String>read()
            .withBootstrapServers("localhost:9092")
            .withTopic("myTopic2")
            .withKeyDeserializer(LongDeserializer.class)
            .withValueDeserializer(StringDeserializer.class)

            .updateConsumerProperties(ImmutableMap.of("auto.offset.reset", (Object)"earliest"))

            // We're writing to a file, which does not support unbounded data sources. This line makes it bounded to
            // the first 5 records.
            // In reality, we would likely be writing to a data source that supports unbounded data, such as BigQuery.
            .withMaxNumRecords(5)

            .withoutMetadata() // PCollection<KV<Long, String>>
    )
            .apply(Values.<String>create())

            .apply(TextIO.write().to("wordcounts"));
    System.out.println("running data pipeline");
    p.run().waitUntilFinish();
}

}

Answer 1

问题是由于使用 LongDeserializer 键似乎是由 Long 以外的其他序列化程序序列化的，这取决于您如何生成记录。

因此，您可以使用适当的反序列化器，或者，如果密钥无关紧要，作为一种解决方法，也可以尝试对密钥使用 StringDeserializer。

用于 Json 条消息的 Apache Beam Kafka IO - org.apache.kafka.common.errors.SerializationException

Apache Beam Kafka IO for Json messages - org.apache.kafka.common.errors.SerializationException

apache-kafka

google-cloud-platform

google-cloud-dataflow

kafka-consumer-api

apache-beam