Kafka Connect S3 - JSON 到 Parquet

Kafka Connect S3 - JSON to Parquet

Kafka Connect S3 是否支持从 JSON 到 Parquet?感谢使用 Kafka Connect S3 的可用和替代建议

Does Kafka Connect S3 support from JSON to Parquet?

不,不是。根据 docs page

You must use the AvroConverter with ParquetFormat in the S3 Sink connector. Attempting to use the JsonConverter (with or without schemas) will result in a runtime exception.

您可以选择 ksqlDB 首先将您的数据重新序列化为 Avro,例如:

CREATE STREAM source (COL1 VARCHAR, COL2 INT, COL3 BIGINT) WITH (VALUE_FORMAT='JSON', KAFKA_TOPIC='my_source_topic');

CREATE STREAM target WITH (KAFKA_TOPIC='my_target_topic', VALUE_FORMAT='AVRO') AS SELECT * FROM source;

完成后,您然后使用 Parquet 格式将 my_target_topic 下沉到 S3(您甚至可以使用 CREATE SINK CONNECTOR… 从 ksqlDB 执行此操作)