从 map 函数的输出中排除 "None"
Excluding "None" from output of map function
我有这个代码:
fileRDD.map(positive)\
.map(lambda x: [x,1])\
.reduceByKey(lambda x,y: x+y)\
.take(10)
输出为:
[(None, 3194395),
(0, 240597),
(1, 224805),
(2, 210585),
(3, 198246),
(4, 202869),
(5, 92615),
(6, 60493)]
如何从输出中删除 None
行? (我只需要 0 到 6 个结果)
通过在 RDD 上使用 filter
函数:
rdd = spark.sparkContext.parallelize([
(None, 3194395), (0, 240597), (1, 224805),
(2, 210585), (3, 198246), (4, 202869),
(5, 92615), (6, 60493)
])
rdd1 = rdd.filter(lambda x: x[0] is not None)
print(rdd1.collect())
#[(0, 240597), (1, 224805), (2, 210585), (3, 198246), (4, 202869), (5, 92615), (6, 60493)]
我有这个代码:
fileRDD.map(positive)\
.map(lambda x: [x,1])\
.reduceByKey(lambda x,y: x+y)\
.take(10)
输出为:
[(None, 3194395),
(0, 240597),
(1, 224805),
(2, 210585),
(3, 198246),
(4, 202869),
(5, 92615),
(6, 60493)]
如何从输出中删除 None
行? (我只需要 0 到 6 个结果)
通过在 RDD 上使用 filter
函数:
rdd = spark.sparkContext.parallelize([
(None, 3194395), (0, 240597), (1, 224805),
(2, 210585), (3, 198246), (4, 202869),
(5, 92615), (6, 60493)
])
rdd1 = rdd.filter(lambda x: x[0] is not None)
print(rdd1.collect())
#[(0, 240597), (1, 224805), (2, 210585), (3, 198246), (4, 202869), (5, 92615), (6, 60493)]