输入格式决定

InputFormat Decision

我正在尝试找出最适合问题的给定答案：

Given a directory of files with the following structure: line number, tab character, string:

Example:

1abialkjfjkaoasdfjksdlkjhqweroij

2kadfjhuwqounahagtnbvaswslmnbfgy

3kjfteiomndscxeqalkzhtopedkfsikj

You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?

A. SequenceFileAsTextInputFormat

B. SequenceFileInputFormat

C. KeyValueFileInputFormat

D. BDBInputFormat

我的分析：

选项 A 是我发现存在的一种格式，但我不确定它的正确用法以及它是否适合作为答案。

选项 B 不可能，因为 SequenceFiles 是二进制数据 (K,V) 对二进制数据的文件，因此不适合..

选项 C 是不可能的，因为没有 KeyValueFileInputFormat，尽管在这里，如果它是一个拼写错误并且它实际上是 KeyValuetextInputFormat，那么我认为这将是一个不错的选择。还是不是？

选项 D 是不可能的，因为没有 BDBInputFormat，即使它是错字而且实际上是 BDInputFormat，但它不适合这种情况。

谢谢！ D

可能是你猜的C选项打错了，应该是https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/KeyValueTextInputFormat.html。

查看更多详情：How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?

答案是选项C。可能是错别字

KeyValueTextInputFormat 帮助您使用 TAB 分割行。所以行号将是键，字符串将是值。

输入格式决定

InputFormat Decision

hadoop

mapreduce