Windows 上的 Linux 集群的 Hadoop2 客户端

Hadoop2 client on Windows for a Linux Cluster

我们有一个 linux hadoop 集群,但由于各种原因,有一些 windows 客户端连接并将数据推送到 linux 集群。 在 hadoop1 中,我们已经能够通过 cygwin 运行 hadoop 然而,在 hadoop2 中,如 website 中所述,cygwin 不需要或不受支持。

问题

  1. 到底发生了什么变化?为什么客户(仅)不 运行 cygwin 或者它可以吗?除了路径之外,还有哪些其他考虑因素在起作用?
  2. 除了下面的 属性 作业提交之外,对于 windows/client 与 linux 集群

    的交互,还有什么需要考虑的吗?

    conf.set("mapreduce.app-submission.cross-platform", "true");

  3. 从 cygwin 中提取 hadoop-2.6.0-cdh5.5.2 并在 $HADOOP_HOME/etc 下使用正确的配置 运行ning 它会产生一些 classpath或 class路径形成问题 class 未发现问题?例如下面的 运行

    hdfs dfs -ls
    Error: Could not find or load main class org.apache.hadoop.fs.FsShell
    

然后查看 classpath 看起来它们包含 cygwin 路径。尝试将它们转换为 windows 路径以便可以查找 jar

in $HADOOP_HOME/etc/hdfs.sh locate the dfs command and change to 
      elif [ "$COMMAND" = "dfs" ] ; then
      if $cygwin; then
         CLASSPATH=`cygpath -p -w "$CLASSPATH"`
      fi
      CLASS=org.apache.hadoop.fs.FsShell

结果如下:

16/04/07 16:01:05 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
    java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
            at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
            at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
            at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)
            at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
            at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
            at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
            at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
            at org.apache.hadoop.fs.FsShell.main(FsShell.java:362)
    16/04/07 16:01:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Warning: fs.defaultFs is not set when running "ls" command.
    Found 15 items
    -ls: Fatal internal error
    java.lang.NullPointerException
            at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
            at org.apache.hadoop.util.Shell.run(Shell.java:478)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
            at org.apache.hadoop.util.Shell.execCommand(Shell.java:831)
            at org.apache.hadoop.util.Shell.execCommand(Shell.java:814)
            at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1100)
            at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
            at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getOwner(RawLocalFileSystem.java:565)
            at org.apache.hadoop.fs.shell.Ls.adjustColumnWidths(Ls.java:139)
            at org.apache.hadoop.fs.shell.Ls.processPaths(Ls.java:110)
            at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
            at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:98)
            at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
            at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
            at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
            at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
            at org.apache.hadoop.fs.FsShell.run(FsShell.java:305)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
            at org.apache.hadoop.fs.FsShell.main(FsShell.java:362)

对于上述问题,我是否应该进一步尝试修复此问题,以便我可以重用现有的客户端 .sh 脚本或只是将它们转换为 .bat?

问题是 cygwin 需要 return windows 路径而不是 cygwin 路径。还有winutils.exe needs to be installed in the path as described here

只需将脚本修复到 return 实际的 win 路径并关闭一些在 cygwin

下不 运行 的命令
#!/bin/bash
# fix $HADOOP_HOME/bin/hdfs
sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/bin/hdfs
sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"$HADOOP_HOME\\libexec\"#g" $HADOOP_HOME/bin/hdfs
sed -i "/export CLASSPATH=$CLASSPATH/i CLASSPATH=\`cygpath -p -w \"$CLASSPATH\"\`" $HADOOP_HOME/bin/hdfs

# fix $HADOOP_HOME/libexec/hdfs-config.sh
sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/libexec/hdfs-config.sh
sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"$HADOOP_HOME\\libexec\"#g" $HADOOP_HOME/libexec/hdfs-config.sh

# fix $HADOOP_HOME/libexec/hadoop-config.sh
sed -i "/HADOOP_DEFAULT_PREFIX=/a HADOOP_PREFIX=" $HADOOP_HOME/libexec/hadoop-config.sh
sed -i "/export HADOOP_PREFIX/i HADOOP_PREFIX=\`cygpath -p -w \"$HADOOP_PREFIX\"\`" $HADOOP_HOME/libexec/hadoop-config.sh

# fix $HADOOP_HOME/bin/hadoop 
sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/bin/hadoop 
sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"$HADOOP_HOME\\libexec\"#g" $HADOOP_HOME/bin/hadoop 
sed -i "/export CLASSPATH=$CLASSPATH/i CLASSPATH=\`cygpath -p -w \"$CLASSPATH\"\`" $HADOOP_HOME/bin/hadoop