nodemanager连接到resourcemanager,但datanode没有连接到namenode

nodemanager is connected to resourcemanager, but datanode is not connected to namenode

我在 CentOS 7.0 上安装了 hadoop 2.5.1。

(1) 当我运行 app over hadoop 时,我怀疑下面的消息路径“/tmp/hadoop-yarn/staging/hadoop/.staging/job_1424775783787_0001/files”是由于兼容性导致的。 如果是兼容性问题,那我该如何修补呢??

15/02/24 20:27:41 ERROR streaming.StreamJob: Error Launching job : File /tmp/hadoop-yarn/staging/hadoop/.staging/job_1424775783787_0001/files/Formatter.sh could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.

(2) 端口master:9000被监听,但是datanode没有连接到master, 但是节点管理器还活着。 所以我可以从 8088 端口检查活动数据节点 但是它没有检查 50070 端口

配置如下。

主机文件

    XXX.XXX.XXX.65 mccb-com65 #server
    XXX.XXX.XXX.66 mccb-com66  #client01
    XXX.XXX.XXX.67 mccb-com67  #client02
    127.0.1.1      mccb-com65 (mccb-com66, mccb-com67 per computer setting)
    127.0.0.1      localhost

核心-site.xml

<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://XXX.XXX.XXX.65:9000</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
 <property>
   <name>dfs.name.dir</name>
   <value>file:///home/hadoop/hdfs/hdfs/namenode</value>
   <description>the path which save the file system image </description>
</property>
<property>
   <name>dfs.data.dir</name>
   <value>file:///home/hadoop/hdfs/hdfs/datanode</value>
   <description>the path which the datanode save the block</description>
</property>
<property>
   <name>dfs.http.address</name>
   <value>0.0.0.0:50070</value>
</property>
<property>
    <name>dfs.datanode.http.address</name>
    <value>0.0.0.0:50075</value>
</property>
<property>
    <name>dfs.datanode.ipc.address</name>
    <value>0.0.0.0:50020</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
       <name>mapred.child.java.opts</name>
       <value>-Xmx400m</value>
    </property>
    <property>
       <name>mapreduce.jobhistory.address</name>
       <value>0.0.0.0:10020</value>
    </property>
   <property>
       <name>mapreduce.jobhistory.webapp.address</name>
       <value>0.0.0.0:19888</value>
    </property>
    <property>
        <name>mapred.system.dir</name>
        <value>/home/hadoop/hdfs/hdfs/mapred/system</value>
       <final>true</final>       
    </property>
    <property>
        <name>mapred.local.dir</name>
        <value>/home/hadoop/hdfs/hdfs/mapred/local</value>
        <final>true</final>
    </property>
</configuration>

纱-site.xml

<configuration>
   <property>
      <name>yarn.resourcemanager.resource-tracker.address</name>
      <value>XXX.XXX.XXX.65:8031</value>
   </property>
   <property> 
       <name>yarn.resourcemanager.scheduler.address</name>
      <value>XXX.XXX.XXX.65:8030</value>
   </property>
   <property> 
       <name>yarn.resourcemanager.address</name>
       <value>XXX.XXX.XXX.65:8032</value>
   </property>
   <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
   </property>
   <property>
       <name>yarn.resoucemanager.webapp.address </name>
       <value>XXX.XXX.XXX.65:8088</value>
    </property>
    <property>
       <name>yarn.nodemanager.webapp.address  </name>
       <value>0.0.0.0:8042</value>
    </property>
  </configuration>

[root@mccb-com65 ~]# netstat -antlp

Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program name tcp 0 0 127.0.0.1:25
0.0.0.0:* LISTEN 2868/master tcp 0 0 0.0.0.0:3389 0.0.0.0:* LISTEN
1746/xrdp tcp 0 0 XXX.XXX.XXX.65:8030
0.0.0.0:* LISTEN 10858/java tcp 0 0 XXX.XXX.XXX.65:8031 0.0.0.0:* LISTEN
10858/java tcp 0 0 XXX.XXX.XXX.65:8032
0.0.0.0:* LISTEN 10858/java tcp 0 0 0.0.0.0:8033 0.0.0.0:* LISTEN
10858/java tcp 0 0 0.0.0.0:50885
0.0.0.0:* LISTEN 2282/rpc.statd tcp 0 0 XXX.XXX.XXX.65:9000 0.0.0.0:* LISTEN
10470/java tcp 0 0 0.0.0.0:50090
0.0.0.0:* LISTEN 10684/java tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
1753/rpcbind tcp 0 0 0.0.0.0:50070
0.0.0.0:* LISTEN 10470/java tcp 0 0 127.0.0.1:3350 0.0.0.0:* LISTEN
1745/xrdp-sesman tcp 0 0 0.0.0.0:22
0.0.0.0:* LISTEN 1761/sshd tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
3278/cupsd tcp 0 0 127.0.0.1:5911
0.0.0.0:* LISTEN 3053/Xvnc tcp 0 0 0.0.0.0:8088 0.0.0.0:* LISTEN
10858/java tcp 0 0 XXX.XXX.XXX.65:8031
XXX.XXX.XXX.67:44914 ESTABLISHED 10858/java tcp 0 0 XXX.XXX.XXX.65:42505 XXX.XXX.XXX.65:9000 TIME_WAIT - tcp 0 0 127.0.0.1:5911 127.0.0.1:50271
ESTABLISHED 3053/Xvnc tcp 0 0 XXX.XXX.XXX.65:3389 XXX.XXX.XXX.96:52951 ESTABLISHED 1746/xrdp tcp 0 0 127.0.0.1:50271
127.0.0.1:5911 ESTABLISHED 1746/xrdp tcp 0 0 XXX.XXX.XXX.65:8031 XXX.XXX.XXX.66:46816 ESTABLISHED 10858/java tcp6 0 0 :::44331 :::* LISTEN 2282/rpc.statd tcp6 0 0 :::111
:::* LISTEN 1753/rpcbind tcp6 0 0 :::22 :::* LISTEN
1761/sshd

以下步骤应该可以解决这个问题。 但是,您可能会丢失数据。

  • 停止 Hadoop:

    • sbin/stop-dfs.sh
    • sbin/stop.yarn.sh
  • 删除NameNode和DataNode:

    • hdfs dfs -rm -r /hdfs/namenode
    • hdfs dfs -rm -r /hdfs/datanodenode
  • 格式化NameNode

    • hdfs 名称节点格式
  • 启动NameNode、Datanodes和YARN

    • sbin/start-dfs.sh
    • sbin/start-yarn.sh