如何等待有限流批量结果

Question

我有一个使用 spring 云流和 kafka 流构建的流处理应用程序，该系统从应用程序中获取日志并将它们与另一个流处理器所做的观察进行比较并产生一个分数，然后日志流被分数分割（高于和低于某个阈值）。

拓扑结构：

问题：

所以我的问题是如何正确实施"Log best observation selector processor"，处理日志时观察的数量有限，但可能有很多。

所以我想到了 2 个解决方案...

Group & Window log-scored-observations topic 按log id 然后归约以获得最高分。（问题：对所有观察结果进行评分可能需要比 window 更长的时间）
每次评分后发出评分完成消息，加入log-relevant-observations，使用log-scored-observations全局table&交互式查询检查每个观察id是否在全局 table 存储，当所有 id 都在存储映射到得分最高的观察时。（问题：global table 仅用于交互式查询时似乎不起作用）

实现我正在尝试的目标的最佳方法是什么？

我希望不要造成任何分区、磁盘或内存瓶颈。
当值从日志和观察中加入时，一切都有唯一的 ID 和相关 ID 的元组。

（编辑：使用图表和更改标题切换拓扑的文本描述）

Answer 1

解决方案 #2 似乎可行，但它发出了警告，因为交互式查询需要一些时间才能准备好 - 所以我使用 Transformer 实现了相同的解决方案：

@Slf4j
@Configuration
@RequiredArgsConstructor
@SuppressWarnings("unchecked")
public class LogBestObservationsSelectorProcessorConfig {
    private String logScoredObservationsStore = "log-scored-observations-store";

    private final Serde<LogEntryRelevantObservationIdTuple> logEntryRelevantObservationIdTupleSerde;
    private final Serde<LogRelevantObservationIdsTuple> logRelevantObservationIdsTupleSerde;
    private final Serde<LogEntryObservationMatchTuple> logEntryObservationMatchTupleSerde;
    private final Serde<LogEntryObservationMatchIdsRelevantObservationsTuple> logEntryObservationMatchIdsRelevantObservationsTupleSerde;

    @Bean
    public Function<
            GlobalKTable<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple>,
                Function<
                    KStream<LogEntryRelevantObservationIdTuple, LogEntryRelevantObservationIdTuple>,
                    Function<
                            KTable<String, LogRelevantObservationIds>,
                            KStream<String, LogEntryObservationMatchTuple>
                    >
                >
            >
    logBestObservationSelectorProcessor() {
        return (GlobalKTable<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple> logScoredObservationsTable) ->
                (KStream<LogEntryRelevantObservationIdTuple, LogEntryRelevantObservationIdTuple> logScoredObservationProcessedStream) ->
                        (KTable<String, LogRelevantObservationIdsTuple> logRelevantObservationIdsTable) -> {
            return logScoredObservationProcessedStream
                    .selectKey((k, v) -> k.getLogId())
                    .leftJoin(
                            logRelevantObservationIdsTable,
                            LogEntryObservationMatchIdsRelevantObservationsTuple::new,
                            Joined.with(
                                    Serdes.String(),
                                    logEntryRelevantObservationIdTupleSerde,
                                    logRelevantObservationIdsTupleSerde
                            )
                    )
                    .transform(() -> new LogEntryObservationMatchTransformer(logScoredObservationsStore))
                    .groupByKey(
                            Grouped.with(
                                Serdes.String(),
                                logEntryObservationMatchTupleSerde
                            )
                    )
                    .reduce(
                            (match1, match2) -> Double.compare(match1.getScore(), match2.getScore()) != -1 ? match1 : match2,
                            Materialized.with(
                                    Serdes.String(),
                                    logEntryObservationMatchTupleSerde
                            )
                    )
                    .toStream()
                    ;
        };
    }

    @RequiredArgsConstructor
    private static class LogEntryObservationMatchTransformer implements Transformer<String, LogEntryObservationMatchIdsRelevantObservationsTuple, KeyValue<String, LogEntryObservationMatchTuple>> {
        private final String stateStoreName;
        private ProcessorContext context;
        private TimestampedKeyValueStore<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple> kvStore;

        @Override
        public void init(ProcessorContext context) {
            this.context = context;
            this.kvStore = (TimestampedKeyValueStore<LogEntryRelevantObservationIdTuple, LogEntryObservationMatchTuple>) context.getStateStore(stateStoreName);
        }

        @Override
        public KeyValue<String, LogEntryObservationMatchTuple> transform(String logId, LogEntryObservationMatchIdsRelevantObservationsTuple value) {
            val observationIds = value.getLogEntryRelevantObservationsTuple().getRelevantObservations().getObservationIds();
            val allObservationsProcessed = observationIds.stream()
                    .allMatch((observationId) -> {
                        val key = LogEntryRelevantObservationIdTuple.newBuilder()
                                .setLogId(logId)
                                .setRelevantObservationId(observationId)
                                .build();
                        return kvStore.get(key) != null;
                    });
            if (!allObservationsProcessed) {
                return null;
            }

            val observationId = value.getLogEntryRelevantObservationIdTuple().getObservationId();
            val key = LogEntryRelevantObservationIdTuple.newBuilder()
                    .setLogId(logId)
                    .setRelevantObservationId(observationId)
                    .build();
            ValueAndTimestamp<LogEntryObservationMatchTuple> observationMatchValueAndTimestamp = kvStore.get(key);
            return new KeyValue<>(logId, observationMatchValueAndTimestamp.value());
        }

        @Override
        public void close() {

        }
    }
}

如何等待有限流批量结果

How to wait for a finite stream bulk result

apache-kafka

spring-cloud-stream

apache-kafka-streams