编译具有附加依赖项的 Hadoop Java 程序

Compilation of Hadoop Java program with additional dependencies

我正在尝试构建一个 Hadoop 程序,其目的是 cat 我之前上传到 HDFS 的文件,主要基于 this tutorial,该程序如下所示:

import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;

public class ReadHDFS {
    public static void main(String[] args) throws IOException {

        String uri = args[0];

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create(uri), conf);
        FSDataInputStream in = null ;

        try
        {
            in = fs.open(new Path(uri));
            IOUtils.copyBytes(in, System.out, 4096, false);
        }
        finally
        {
            IOUtils.closeStream(in);
        }   
    }
}

在我看来,该教程存在缺陷,因为- 根据我的理解- IOUtils 是 apache.commons 库的一部分。但是,尽管我在我一直试图部署的程序中添加了以下行:

import org.apache.commons.compress.utils.IOUtils;

我仍然遇到以下错误:

即:

FileSystemCat.java:37: error: cannot find symbol
        IOUtils.copyBytes(in, System.out, 4096, false);
               ^
  symbol:   method copyBytes(InputStream,PrintStream,int,boolean)
  location: class IOUtils
FileSystemCat.java:40: error: cannot find symbol
        IOUtils.closeStream(in);
                            ^
  symbol:   variable in
  location: class FileSystemCat
2 errors

我正在使用以下命令在 NameNode 上执行它:

javac -cp /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.1.jar:/home/ubuntu/job_program/commons-io-2.5/commons-io-2.5.jar FileSystemCat.java 

~/.bashrc 的必要附录:

# Classpath for Java

# export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)

底层程序如何编译:

javac -cp ${HADOOP_CLASSPATH}:commons-io-2.5.jar ReaderHDFS.java

如何为该程序生成 jar 文件:

jar cf rhdfs.jar ReaderHDFS*.class

运行 命令:

$HADOOP_HOME/bin/hadoop jar rhdfs.jar ReaderHDFS hdfs://master:9000/input_1/codes.txt

这是程序:

import org.apache.hadoop.io.IOUtils;
//import org.apache.commons.io.IOUtils;
import java.io.*;
import java.net.URI;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;

public class ReaderHDFS {
    public static void main(String[] args) throws IOException {

        String uri = args[0];

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create(uri), conf);
        FSDataInputStream in = null ;

        try
        {
            in = fs.open(new Path(uri));
            IOUtils.copyBytes(in, System.out, 4096, false);
        }
        finally
        {
            IOUtils.closeStream(in);
        }   
    }
}