DataProc Cluster Spark Job提交启动NodeManager失败
DataProc Cluster Spark Job submission fails to start NodeManager
我们有配置了 4 个工作器的 Dataproc 集群。集群已启动且 运行,每当我们尝试提交 spark-job 时,我们都会收到此错误:
YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager
在 Stackdriver 日志中看到的一些消息是
Daemon YARN_NODE_MANAGER failed to restart
更新:
即使我们将新的工作节点添加到现有 Dataproc 集群,也会注意到此问题。
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from <MasterNode DNS> , Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:374)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:252)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:845)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:912)
这个错误看起来像是 YARN 节点管理器停用问题。你能检查Dataproc master GCE VM中的以下YARN include/exclude节点配置文件是否有错误:
- /etc/hadoop/conf/nodes_exclude
- /etc/hadoop/conf/nodes_include
更改这些配置文件后,请运行刷新节点命令:
yarn rmadmin -refreshNodes
然后您应该会看到 Nodemanager 重新加入 YARN。
我们有配置了 4 个工作器的 Dataproc 集群。集群已启动且 运行,每当我们尝试提交 spark-job 时,我们都会收到此错误:
YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager
在 Stackdriver 日志中看到的一些消息是
Daemon YARN_NODE_MANAGER failed to restart
更新: 即使我们将新的工作节点添加到现有 Dataproc 集群,也会注意到此问题。
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager, Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from <MasterNode DNS> , Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:374)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:252)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:845)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:912)
这个错误看起来像是 YARN 节点管理器停用问题。你能检查Dataproc master GCE VM中的以下YARN include/exclude节点配置文件是否有错误:
- /etc/hadoop/conf/nodes_exclude
- /etc/hadoop/conf/nodes_include
更改这些配置文件后,请运行刷新节点命令:
yarn rmadmin -refreshNodes
然后您应该会看到 Nodemanager 重新加入 YARN。