pyspark flatmat error: TypeError: 'int' object is not iterable
pyspark flatmat error: TypeError: 'int' object is not iterable
这是我书中的示例代码:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://chetan-ThinkPad-
E470:7077").setAppName("FlatMap")
sc = SparkContext(conf=conf)
numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.flatMap(lambda x: x + x).collect()
for values in actionRDD:
print(values)
我收到此错误:
类型错误:'int' 对象不可迭代
at org.apache.spark.api.python.PythonRunner$$anon.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
您不能在 Int
对象上使用 flatMap
flatMap
可用于 集合对象 例如 Arrays
或 list
.
您可以在 rdd
类型上使用地图函数 RDD[Integer]
numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.map(lambda x: x + x)
def printing(x):
print x
actionRDD.foreach(printing)
哪个应该打印
2
4
6
8
这是我书中的示例代码:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://chetan-ThinkPad-
E470:7077").setAppName("FlatMap")
sc = SparkContext(conf=conf)
numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.flatMap(lambda x: x + x).collect()
for values in actionRDD:
print(values)
我收到此错误: 类型错误:'int' 对象不可迭代
at org.apache.spark.api.python.PythonRunner$$anon.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
您不能在 Int
对象上使用 flatMap
flatMap
可用于 集合对象 例如 Arrays
或 list
.
您可以在 rdd
类型上使用地图函数 RDD[Integer]
numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.map(lambda x: x + x)
def printing(x):
print x
actionRDD.foreach(printing)
哪个应该打印
2
4
6
8