打开作业 jar 时出错：hdfs 中的文件

Question

我一直在尝试解决这个问题，但不确定我在这里犯了什么错误！你能帮我解决这个问题吗？提前致谢！

我的程序：

打包 hadoopbook；

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCount {

    //Mapper
    public static class WcMapperDemo extends Mapper<LongWritable, Text, Text, IntWritable>{

        Text MapKey = new Text();
        IntWritable MapValue = new IntWritable();

        public void map(LongWritable key, Text Value, Context Context) throws IOException, InterruptedException{
            String Record = Value.toString();
            String[] Words = Record.split(",");

            for (String Word:Words){
                MapKey.set(Word);
                MapValue.set(1);
                Context.write(MapKey, MapValue);
            }   
        }
    }

    //Reducer
    public static class WcReducerDemo extends Reducer<Text, IntWritable, Text, IntWritable>{

        IntWritable RedValue = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> Values, Context Context) throws IOException, InterruptedException{
            int sum = 0;

            for (IntWritable Value:Values){
                sum = sum + Value.get();
            }
            RedValue.set(sum);
            Context.write(key, RedValue);
        }
    }

    //Driver
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        Configuration Conf = new Configuration();
        Job Job = new Job(Conf, "Word Count Job");

        Job.setJarByClass(WordCount.class);
        Job.setMapperClass(WcMapperDemo.class);
        Job.setReducerClass(WcReducerDemo.class);

        Job.setMapOutputKeyClass(Text.class);
        Job.setMapOutputValueClass(IntWritable.class);

        Job.setOutputKeyClass(Text.class);
        Job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(Job, new Path (args[0]));
        FileOutputFormat.setOutputPath(Job, new Path (args[1]));

        System.exit(Job.waitForCompletion(true) ? 0:1);
    }
}

Jar 文件位于以下位置的 hdfs 上：

/user/cloudera/Programs/WordCount.jar

权限为： rw-rw-rw-

输入文件位于以下位置：

/user/cloudera/Input/Words.txt

权限为： rw-rw-rw-

输出文件夹如下：

/user/cloudera/Output

当我尝试运行时：

[cloudera@localhost ~]$ hadoop jar /user/cloudera/Programs/WordCount.jar hadoopbook.WordCount /user/cloudera/Input/Words.txt /user/cloudera/Output

在此之后我得到一个错误，我被困在这里！

Exception in thread "main" java.io.IOException: Error opening job jar: /user/cloudera/Programs/WordCount.jar
    at org.apache.hadoop.util.RunJar.main(RunJar.java:135)
Caused by: java.util.zip.ZipException: error in opening zip file
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:127)
    at java.util.jar.JarFile.<init>(JarFile.java:135)
    at java.util.jar.JarFile.<init>(JarFile.java:72)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:133)

Answer 1

Jar 需要存在于本地文件系统中（它不应该存在于 HDFS 中。）并且您需要拥有主 class 的完整包名称。

打开作业 jar 时出错：hdfs 中的文件

Error opening job jar: file in hdfs

java

hadoop

hdfs