python hadoop 上的流式 mapreduce 作业失败 - 缺少 log4j?
python streaming mapreduce job on hadoop failed - missing log4j?
我试图 运行 在安装在 Ubuntu 15.10 上的 hadoop 2.7.1 上进行 python 字数统计,但出现错误:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
此外,我在终端中收到 RunTimeException 错误,并收到流式传输失败的信息,并且没有输出文件。
我发现一些线程说可能缺少 log4j.properties 和 log4j.xml,还有 log4j.properties 应该包含的示例,我尝试了一个示例但没有成功。我在哪里可以找到 Hadoop 目录中的文件(如果我能找到它们)或者如何使用正确的配置创建它们?
wordcount 的 mapper 和 reducer 代码取自 here 运行
绝对没问题
input.txt|./mapper.py|sort|./reducer.py
但是,我尝试了几次 运行 它在 hadoop 上,但都失败了。当 python 文件被复制到 hdfs 时以及它们位于本地文件系统时,我都使用了不同的命令来尝试:
这个没用:
hadoop hadoop-streaming-2.7.1.jar -mapper /user/mapper.py -reducer /user/reducer.py -input/input_file.txt -output /user/output
也不是这个:
hadoop hadoop-streaming-2.7.1.jar -mapper "python /user/mapper.py" -reducer "python /user/reducer.py" -input/input_file.txt -output /user/output
这个确实有效(python 个本地文件系统中的文件):
hadoop hadoop-streaming-2.7.1.jar -mapper "python /home/user_name/Documents/mapper.py" -reducer "python /home/user_name/Documents/reducer.py -input /user/input_file.txt -output /user/output
所有文件都有正确的权限。
标准开始后的输出如下:
16/02/15 09:47:48 INFO mapreduce.Job: map 0% reduce 0%
16/02/15 09:48:05 INFO mapreduce.Job: Task Id : attempt_1455529218252_0001_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/user/mr/mapper.py": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
还有很多,但最终输出是关于流式传输作业失败的:
16/02/15 09:49:07 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=135543
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=135543
Total vcore-seconds taken by all map tasks=135543
Total megabyte-seconds taken by all map tasks=138796032
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/02/15 09:49:07 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
从 hdfs 调用时 python 代码不起作用的原因可能是什么?
您应该只提供 local python 文件的名称作为 -mapper
和 -reducer
的参数。它们不需要在 HDFS 上,您也不应该在命令行中提供字符串来执行脚本。
您还需要为每个脚本提供一个 -file
参数。尝试使用
hadoop hadoop-streaming-2.7.1.jar -file /home/user_name/Documents/mapper.py -file /home/user_name/Documents/reducer.py -mapper /home/user_name/Documents/mapper.py -reducer /home/user_name/Documents/reducer.py -input /input_file.txt -output /user/output
我试图 运行 在安装在 Ubuntu 15.10 上的 hadoop 2.7.1 上进行 python 字数统计,但出现错误:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
此外,我在终端中收到 RunTimeException 错误,并收到流式传输失败的信息,并且没有输出文件。
我发现一些线程说可能缺少 log4j.properties 和 log4j.xml,还有 log4j.properties 应该包含的示例,我尝试了一个示例但没有成功。我在哪里可以找到 Hadoop 目录中的文件(如果我能找到它们)或者如何使用正确的配置创建它们?
wordcount 的 mapper 和 reducer 代码取自 here 运行
绝对没问题input.txt|./mapper.py|sort|./reducer.py
但是,我尝试了几次 运行 它在 hadoop 上,但都失败了。当 python 文件被复制到 hdfs 时以及它们位于本地文件系统时,我都使用了不同的命令来尝试: 这个没用:
hadoop hadoop-streaming-2.7.1.jar -mapper /user/mapper.py -reducer /user/reducer.py -input/input_file.txt -output /user/output
也不是这个:
hadoop hadoop-streaming-2.7.1.jar -mapper "python /user/mapper.py" -reducer "python /user/reducer.py" -input/input_file.txt -output /user/output
这个确实有效(python 个本地文件系统中的文件):
hadoop hadoop-streaming-2.7.1.jar -mapper "python /home/user_name/Documents/mapper.py" -reducer "python /home/user_name/Documents/reducer.py -input /user/input_file.txt -output /user/output
所有文件都有正确的权限。
标准开始后的输出如下:
16/02/15 09:47:48 INFO mapreduce.Job: map 0% reduce 0%
16/02/15 09:48:05 INFO mapreduce.Job: Task Id : attempt_1455529218252_0001_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/user/mr/mapper.py": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 24 more
还有很多,但最终输出是关于流式传输作业失败的:
16/02/15 09:49:07 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=135543
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=135543
Total vcore-seconds taken by all map tasks=135543
Total megabyte-seconds taken by all map tasks=138796032
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/02/15 09:49:07 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
从 hdfs 调用时 python 代码不起作用的原因可能是什么?
您应该只提供 local python 文件的名称作为 -mapper
和 -reducer
的参数。它们不需要在 HDFS 上,您也不应该在命令行中提供字符串来执行脚本。
您还需要为每个脚本提供一个 -file
参数。尝试使用
hadoop hadoop-streaming-2.7.1.jar -file /home/user_name/Documents/mapper.py -file /home/user_name/Documents/reducer.py -mapper /home/user_name/Documents/mapper.py -reducer /home/user_name/Documents/reducer.py -input /input_file.txt -output /user/output