尝试启动 PySpark 时出现空指针异常
Getting a Null Pointer Exception when I am trying to start PySpark
我正在使用以下命令启动 pyspark
./bin/pyspark --master yarn --deploy-mode client --executor-memory 5g
然后出现以下错误
15/10/14 17:19:15 INFO spark.SparkContext: SparkContext already stopped.
Traceback (most recent call last):
File "/opt/spark-1.5.1/python/pyspark/shell.py", line 43, in <module>
sc = SparkContext(pyFiles=add_files)
File "/opt/spark-1.5.1/python/pyspark/context.py", line 113, in __init__
conf, jsc, profiler_cls)
File "/opt/spark-1.5.1/python/pyspark/context.py", line 178, in _do_init
self._jvm.PythonAccumulatorParam(host, port))
File "/opt/spark-1.5.1/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
File "/opt/spark-1.5.1/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.python.PythonAccumulatorParam.
: java.lang.NullPointerException
at org.apache.spark.api.python.PythonAccumulatorParam.<init>(PythonRDD.scala:825)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
出于某种原因,我也收到了这条消息
ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
和
WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@192.168.1.112:48644] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
可能这就是我停止 SparkContext 的原因。
我正在使用 Spark 1.5.1 和 Hadoop 2.7.1 以及 Yarn 2.7。
有谁知道为什么 Yarn 应用程序在任何事情发生之前就退出了?
有关更多信息,这是我的纱线-site.xml
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>26624</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>26624</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
这是我的 mapred-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1640M</value>
<description>Heap size for map jobs.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>16384</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx13107M</value>
<description>Heap size for reduce jobs.</description>
</property>
我可以通过添加
来解决这个问题
spark.yarn.am.memory 5g
到 spark-default.conf 文件。
我认为这是一个与内存相关的问题。
此参数的默认值为 512m
我遇到了一些类似的问题,当我查看端口 8088 上的 Hadoop GUI 并单击我的 PySpark 作业的 ID 列中的应用程序 link 时,我看到以下错误:
Uncaught exception: org.apache…InvalidResourceRequestException Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=8, maxVirtualCores=1
如果我将我的脚本更改为使用 --executor-cores 1
而不是我的默认值 (--executor-cores 8
),那么它就起作用了。现在我只需要让管理员更改一些 Yarn 设置以允许更多内核,例如 yarn.scheduler.maximum-allocation-vcores
,请参阅
我正在使用以下命令启动 pyspark
./bin/pyspark --master yarn --deploy-mode client --executor-memory 5g
然后出现以下错误
15/10/14 17:19:15 INFO spark.SparkContext: SparkContext already stopped.
Traceback (most recent call last):
File "/opt/spark-1.5.1/python/pyspark/shell.py", line 43, in <module>
sc = SparkContext(pyFiles=add_files)
File "/opt/spark-1.5.1/python/pyspark/context.py", line 113, in __init__
conf, jsc, profiler_cls)
File "/opt/spark-1.5.1/python/pyspark/context.py", line 178, in _do_init
self._jvm.PythonAccumulatorParam(host, port))
File "/opt/spark-1.5.1/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__
File "/opt/spark-1.5.1/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.python.PythonAccumulatorParam.
: java.lang.NullPointerException
at org.apache.spark.api.python.PythonAccumulatorParam.<init>(PythonRDD.scala:825)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
出于某种原因,我也收到了这条消息
ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
和
WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@192.168.1.112:48644] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
可能这就是我停止 SparkContext 的原因。
我正在使用 Spark 1.5.1 和 Hadoop 2.7.1 以及 Yarn 2.7。
有谁知道为什么 Yarn 应用程序在任何事情发生之前就退出了?
有关更多信息,这是我的纱线-site.xml
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>26624</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>26624</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
这是我的 mapred-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1640M</value>
<description>Heap size for map jobs.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>16384</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx13107M</value>
<description>Heap size for reduce jobs.</description>
</property>
我可以通过添加
来解决这个问题spark.yarn.am.memory 5g
到 spark-default.conf 文件。
我认为这是一个与内存相关的问题。
此参数的默认值为 512m
我遇到了一些类似的问题,当我查看端口 8088 上的 Hadoop GUI 并单击我的 PySpark 作业的 ID 列中的应用程序 link 时,我看到以下错误:
Uncaught exception: org.apache…InvalidResourceRequestException Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=8, maxVirtualCores=1
如果我将我的脚本更改为使用 --executor-cores 1
而不是我的默认值 (--executor-cores 8
),那么它就起作用了。现在我只需要让管理员更改一些 Yarn 设置以允许更多内核,例如 yarn.scheduler.maximum-allocation-vcores
,请参阅