pyspark groupByKey的可迭代对象（ResultIterable）这个有什么优点？

Question

关于 groupByKey 转换后的结果结构，我在网上找不到任何有用的信息。 groupByKey 之后的 ResultIterable 对象可以做什么？我本以为会返回一个带有密钥的列表。我可以将其转换为列表，但不确定是否遗漏了什么

Answer 1

what are the advantages of this?

A special result iterable. This is used because the standard
iterator can not be pickled

What can I do with the "ResultIterable"

您可以对任何 Iterable 对象执行相同的操作：

class ResultIterable(collections.Iterable):

specifically you can assume that it implements __iter__ dunder 方法 - 这意味着它可以迭代或转换为另一个集合，并且可以在需要可迭代对象的任何时候使用。

I would have expected a list

list 需要集合的具体实现。 Iterable 允许其他选项，包括大于内存集合，具体实现可以根据需要更改。

pyspark groupByKey's Iterable object (ResultIterable) what are the advantages of this?