python 2.7：从集合列表创建字典

Question

执行一些操作后，我得到 set 的 list，如下所示：

from pyspark.mllib.fpm import FPGrowth

FreqItemset(items=[u'A_String_0'], freq=303)
FreqItemset(items=[u'A_String_0', u'Another_String_1'], freq=302)
FreqItemset(items=[u'B_String_1', u'A_String_0', u'A_OtherString_1'], freq=301)

我想从这个列表创建：

RDD

字典，例如：

key: A_String_0 value: 303
key: A_String_0,Another_String_1 value: 302
key: B_String_1,A_String_0,A_OtherString_1 value: 301

我想继续计算以产生信心和提升

我尝试执行 for 循环以从列表中获取每个项目。

问题是这里是否有另一种更好的方法来创建 rdd and/or 列表？

提前致谢。

Answer 1

想要RDD就别收藏freqItemsets
```
model = FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)
freqItemsets = model.freqItemsets()
```
当然可以parallelize

结果=model.freqItemsets().collect() sc.parallelize（结果）
我不确定你为什么需要这个（它看起来像 XY problem 但您可以对收集的数据使用理解：
```
{tuple(x.items): x.freq for x in result}
```
或
```
{",".join(x.items): x.freq for x in result}
```

一般来说，如果您想对数据应用进一步的转换，请不要直接在 Spark 中收集和处理数据。

您还应该看看 Scala API。它已经实现了 association rules.

python 2.7：从集合列表创建字典

python 2.7 : create dictionary from list of sets

python

python-2.7

apache-spark

rdd

pyspark