无法 运行 在 hadoop 2.7 中映射缩减作业 - 类型不匹配

Unable to run map reduce job in hadoop 2.7 - Type Mismatch

虽然 运行 程序获得 Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable

我尝试了来自 google / stack 站点的更多建议。但没有运气。仍然有同样的例外。知道吗,我错过了什么?

我的导入

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

我的地图class

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> 
{
    Text k = new Text();


    public void map(Text key, Iterable<IntWritable> value, Context context) 
                throws IOException, InterruptedException {
        String line = value.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line," "); 
        while (tokenizer.hasMoreTokens()) { 
            String year= tokenizer.nextToken();
            k.set(year);
            String temp= tokenizer.nextToken().trim();
            int v = Integer.parseInt(temp); 
            context.write(k,new IntWritable(v)); 
        }
    }
}

我的减少 class

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>
{

    public void reduce (Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int maxtemp=0;
        for(IntWritable it : values) {
            int temperature= it.get();
            if(maxtemp<temperature)
            {
                maxtemp =temperature;
            }
        }
        context.write(key, new IntWritable(maxtemp)); 
    }
}

和主要

Configuration conf = new Configuration();

Job job = new Job(conf, "MaxTemp");
job.setJarByClass(MaxTemp.class);
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

Path outputPath = new Path(args[1]);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

outputPath.getFileSystem(conf).delete(outputPath);

System.exit(job.waitForCompletion(true) ? 0 : 1);

(我在 Eclipse IDE (Mars) 中用 Java 7 编译了这段代码 - 导出为可运行的 jar,Hadoop 版本为 2.7.0)

如果您将 @Override 注释添加到 map 函数,您会发现它不会覆盖 Mapper 中的 map 方法。

如果您查看 Mapper (link here) 的 Javadoc,您会发现 map 方法应该如下所示:

map(KEYIN key, VALUEIN value, org.apache.hadoop.mapreduce.Mapper.Context context)

你的样子

map(Text key, Iterable<IntWritable> value, Context context)

所以你的应该是:

map(LongWritable key, Text value, Context context)

所以因为你实际上并没有覆盖 Mapper 中的基础 map class,所以你的方法没有被称为它使用 Mapper 中的那个看起来喜欢:

protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
    context.write((KEYOUT) key, (VALUEOUT) value);
}

这将接收 LongWritableText 并将它们写回(Identity Mapper),这与您告诉它的 TextIntWritable 不匹配他们应该是。

在您的驱动程序中,这些行:

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

应该更像是:

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

您需要使用您的实现而不是基础 classes。

您的映射器定义 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> 定义键为 LongWritable 和值 Text

但是您的映射方法 public void map(Text key, Iterable<IntWritable> value, Context context)Text 定义为键,将 Iterable<IntWritable> 定义为值。

因此您的地图方法应定义为 public void map(LongWritable key, Text value, Context context)