DStream 和 Seq[RDD] 有什么区别?

What is the difference between DStream and Seq[RDD]?

文档中的 DStream 定义,

Discretized Stream or DStream is the basic abstraction provided by Spark Streaming. It represents a continuous stream of data, either the input data stream received from source, or the processed data stream generated by transforming the input stream. Internally, a DStream is represented by a continuous series of RDDs, which is Spark’s abstraction of an immutable, distributed dataset.

问题是,如果它被表示为一系列 RDD,我们是否可以制作 RDD 的 Stream 并期望它像 DStream 一样工作?

如果有人可以通过代码示例帮助我理解这一点,那就太好了。

The question is if it is represented as series of RDDs, can we make Stream of RDD and expect it to work similar to DStream?

你是对的。 DStream 逻辑上是一系列 RDD

Spark Streaming 只是为了隐藏创建过程 Seq[RDD] 所以它不是你的工作而是框架。

此外,Spark Streaming 为您提供了更好的开发人员 API,因此您可以将 Seq[RDD] 视为 DStream,而不是 rdds.map(rdd => your code goes here),您可以简单地 dstream.map(t => your code goes here) 除了 rddt 的类型外,没有什么不同。使用 DStream.

时,您的级别已经低了一级