Storm 和 Spark Streaming 在处理 tuples\messages 时的延迟有什么区别？

What 's difference for the latency between Storm and Spark Streaming when dealing with tuples\messages?

1，根据下面的描述，Storm 和 Spark Streaming 都处理 messages/tuples 批处理还是 small/micro 批处理？ https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html

2，如果上述问题的答案是肯定的，说明两种技术在处理messages/tuples时都有延迟？如果是这样的话，为什么我经常听到 Storm 的延迟比 Spark Streaming 更好，例如下面的文章？ https://www.ericsson.com/research-blog/data-knowledge/apache-storm-vs-spark-streaming/

3,从Trident-tutorial它描述了： "Generally the size of those small batches will be on the order of thousands or millions of tuples, depending on your incoming throughput." 那么小批量的真正大小是多少？数千或数百万元组？如果是，Storm 如何保持短延迟？

https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html

Storm 的核心 api 尝试在事件到达时对其进行处理。它是一个事件处理模型，可以导致非常低的延迟。

Storm 的三叉戟是建立在风暴核心 api 之上的微批处理模型，用于提供精确一次保证。 Spark Streaming 也基于微批处理，在延迟方面与 Trident 相当。

因此，如果有人正在寻找极低的延迟处理 Storm 的核心 api 将是可行的方法。然而，这保证至少只处理一次，并且在发生故障的情况下有可能接收到重复的事件，并且应用程序应该能够处理这个问题。

查看来自 yahoo [1] 的流式基准测试，它可以提供更多见解。

[1] https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

Storm 和 Spark Streaming 在处理 tuples\messages 时的延迟有什么区别？

What 's difference for the latency between Storm and Spark Streaming when dealing with tuples\messages?

apache-spark

apache-storm