列表中数字的频率 - Pyspark
Frequencies of number in a list - Pyspark
我有这段输出值列表的代码:
ARDD.map(function_B) \
.filter(lambda x: x is not None) \
.take(6)
输出:
['2','10','2','12','3','3']
如何更改代码以获得此输出?
[2:2, 3:2, 10:1, 12:1]
使用map
和reduceByKey
RDD方法:
rdd = spark.sparkContext.parallelize(['2', '10', '2', '12', '3', '3'])
rdd1 = rdd.map(lambda x: (x, 1)) \
.reduceByKey(lambda a, b: a + b) \
.map(lambda x: f"{x[0]}:{x[1]}")
print(rdd1.collect())
#['10:1', '12:1', '3:2', '2:2']
我有这段输出值列表的代码:
ARDD.map(function_B) \
.filter(lambda x: x is not None) \
.take(6)
输出:
['2','10','2','12','3','3']
如何更改代码以获得此输出?
[2:2, 3:2, 10:1, 12:1]
使用map
和reduceByKey
RDD方法:
rdd = spark.sparkContext.parallelize(['2', '10', '2', '12', '3', '3'])
rdd1 = rdd.map(lambda x: (x, 1)) \
.reduceByKey(lambda a, b: a + b) \
.map(lambda x: f"{x[0]}:{x[1]}")
print(rdd1.collect())
#['10:1', '12:1', '3:2', '2:2']