Apache Spark Scala：groupbykey 是否维护输入 RDD 中值的顺序

Question

可能我问的是非常基本的问题，对此我深表歉意，但我没有在互联网上找到它的答案。是 groupBykey 维护值的顺序。输入 RDD 中最先出现的值应该首先出现在输出 RDD 中。我试过了，它是该订单的主线，但我想从专家那里确认这一点。我需要像下面这样的东西

Input RDD [Int, Int]
 1 20
 2 10
 1 8
 1 25

Output RDD
 1 20 8 25
 2 10

Answer 1

没有

Group the values for each key in the RDD into a single sequence. Hash-partitions the resulting RDD with the existing partitioner/parallelism level. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions@groupByKey():org.apache.spark.rdd.RDD[(K,Iterable[V])]

Apache Spark Scala：groupbykey 是否维护输入 RDD 中值的顺序

Apache Spark Scala : groupbykey maintains order of values in input RDD or not

scala

apache-spark

rdd