Apache Spark Scala:groupbykey 是否维护输入 RDD 中值的顺序
Apache Spark Scala : groupbykey maintains order of values in input RDD or not
可能我问的是非常基本的问题,对此我深表歉意,但我没有在互联网上找到它的答案。是 groupBykey 维护值的顺序。输入 RDD 中最先出现的值应该首先出现在输出 RDD 中。我试过了,它是该订单的主线,但我想从专家那里确认这一点。我需要像下面这样的东西
Input RDD [Int, Int]
1 20
2 10
1 8
1 25
Output RDD
1 20 8 25
2 10
没有
Group the values for each key in the RDD into a single sequence. Hash-partitions the resulting RDD with the existing partitioner/parallelism level. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated.
可能我问的是非常基本的问题,对此我深表歉意,但我没有在互联网上找到它的答案。是 groupBykey 维护值的顺序。输入 RDD 中最先出现的值应该首先出现在输出 RDD 中。我试过了,它是该订单的主线,但我想从专家那里确认这一点。我需要像下面这样的东西
Input RDD [Int, Int]
1 20
2 10
1 8
1 25
Output RDD
1 20 8 25
2 10
没有
Group the values for each key in the RDD into a single sequence. Hash-partitions the resulting RDD with the existing partitioner/parallelism level. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated.