hadoop mapreduce Mapper 从文本文件中读取不正确的值

Question

我正在编写一个 mapreduce 程序来处理一个文本文件，为每个文件附加一个字符串 line.The 我面临的问题是映射器的 map 方法中的文本值不正确。

每当文件中的一行小于前一行时，会自动向该行追加几个字符以使该行的长度等于上一行的长度。

Map方法参数如下

*@Override
protected void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {*

我正在记录 map 方法中的值并观察此行为。有什么指点吗？

代码片段

Driver

Configuration configuration = new Configuration();
        configuration.set("CLIENT_ID", "Test");
        Job job = Job.getInstance(configuration, JOB_NAME);
        job.setJarByClass(JobDriver.class);
        job.setMapperClass(AdwordsMapper.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        FileOutputFormat.setCompressOutput(job, true);
        FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);

        System.exit(job.waitForCompletion(true) ? 0 : 1);


Mapper

public class AdwordsMapper extends Mapper<LongWritable, Text, Text, Text> {

    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String textLine = new String(value.getBytes());

        textLine = new StringBuffer(textLine).append(",")
                .append(context.getConfiguration().get("CLIENT_ID")).toString();
        context.write(new Text(""), new Text(textLine));

    }

}

Answer 1

据我所知，映射器中的问题是 getBytes();

而不是这个

   String textLine = new String(value.getBytes());

试一试。

   String textLine = value.toString();

hadoop mapreduce Mapper 从文本文件中读取不正确的值

hadoop mapreduce Mapper reading incorrect value from text file

java

hadoop

mapreduce