Apache Pig:作业处于状态 DEFINE 而不是 运行
Apache Pig : Job in state DEFINE instead of RUNNING
我正在使用 Apache Pig。我正在尝试加载一个逗号分隔的文件作为 Pig table。它在加载文件时不会抛出任何错误。
但是当我尝试使用 "dump" 命令打印 table 时,它给出了错误。
我加载的文件
Error,fdgdf
Error,dfgdf
Error,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
要加载的命令
logFile1 = LOAD 'PigTestFile' using PigStorage();
打印命令table
dump logFile1
我收到错误
led Jobs:
JobId Alias Feature Message Outputs
job_1454617624671_0152 logFile1 MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs:
//ip-172-31-53-48.ec2.internal:8020/user/e1681fe26eed362777aabca1682510/PigTestFile
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-53-48.ec2.internal:8020/user/e1681fe26eed362777aabca1682510/PigTestFile
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
hdfs://ip-172-31-53-48.ec2.internal:8020/tmp/temp1258481141/tmp-1928081547,
:
:
2016-02-07 06:31:20,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-02-07 06:31:20,107 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias logFile1. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
[编辑]
当我仔细阅读日志时,我发现它无法找到用于加载 table 的文件。它期望它在 HDFS 中。我的文件在本地盒子上的位置。
然后我将文件移动到 HDFS,然后 运行 相同的命令。效果很好。
但是为什么在执行"Load"命令时没有报错呢??
只有在遇到 STORE/ DUMP 时才会触发脚本的 Map/ Reduce 作业。
在这种情况下,只有在脚本中遇到 STORE/DUMP 时,LOAD 命令的映射阶段才会开始。
默认执行模式是map reduce。如果文件在本地路径中,则您已使用本地模式执行。
pig -x local {pigfilename.pig}
参考: https://pig.apache.org/docs/r0.9.1/start.html#execution-modes
从上面摘录 link :
Pig has two execution modes or exectypes:
Local Mode - To run Pig in local mode, you need access to a single
machine; all files are installed and run using your local host and
file system. Specify local mode using the -x flag (pig -x local).
Mapreduce Mode - To run Pig in mapreduce mode, you need access to a
Hadoop cluster and HDFS installation. Mapreduce mode is the default
mode; you can, but don't need to, specify it using the -x flag (pig OR
pig -x mapreduce).
正如 Murali 在他的回答(我已接受)中所解释的那样,只有在遇到 STORE/DUMP 时才会触发脚本的 Map/Reduce 作业。
对此有更多解释
一般来说,Pig 会按如下方式处理 Pig Latin 语句:
首先,Pig 验证所有语句的语法和语义。
接下来,如果 Pig 遇到 DUMP 或 STORE,Pig 将执行这些语句。
在此示例中,Pig 将验证但不执行 LOAD 和 FOREACH 语句。
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
在此示例中,Pig 将验证并执行 LOAD、FOREACH 和 DUMP 语句。
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
DUMP B;
(John)
(Mary)
(Bill)
(Joe)
我正在使用 Apache Pig。我正在尝试加载一个逗号分隔的文件作为 Pig table。它在加载文件时不会抛出任何错误。 但是当我尝试使用 "dump" 命令打印 table 时,它给出了错误。
我加载的文件
Error,fdgdf
Error,dfgdf
Error,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
要加载的命令
logFile1 = LOAD 'PigTestFile' using PigStorage();
打印命令table
dump logFile1
我收到错误
led Jobs:
JobId Alias Feature Message Outputs
job_1454617624671_0152 logFile1 MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs:
//ip-172-31-53-48.ec2.internal:8020/user/e1681fe26eed362777aabca1682510/PigTestFile
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-53-48.ec2.internal:8020/user/e1681fe26eed362777aabca1682510/PigTestFile
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
hdfs://ip-172-31-53-48.ec2.internal:8020/tmp/temp1258481141/tmp-1928081547,
:
:
2016-02-07 06:31:20,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-02-07 06:31:20,107 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias logFile1. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
[编辑] 当我仔细阅读日志时,我发现它无法找到用于加载 table 的文件。它期望它在 HDFS 中。我的文件在本地盒子上的位置。 然后我将文件移动到 HDFS,然后 运行 相同的命令。效果很好。
但是为什么在执行"Load"命令时没有报错呢??
只有在遇到 STORE/ DUMP 时才会触发脚本的 Map/ Reduce 作业。
在这种情况下,只有在脚本中遇到 STORE/DUMP 时,LOAD 命令的映射阶段才会开始。
默认执行模式是map reduce。如果文件在本地路径中,则您已使用本地模式执行。
pig -x local {pigfilename.pig}
参考: https://pig.apache.org/docs/r0.9.1/start.html#execution-modes
从上面摘录 link :
Pig has two execution modes or exectypes:
Local Mode - To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local). Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode; you can, but don't need to, specify it using the -x flag (pig OR pig -x mapreduce).
正如 Murali 在他的回答(我已接受)中所解释的那样,只有在遇到 STORE/DUMP 时才会触发脚本的 Map/Reduce 作业。
对此有更多解释一般来说,Pig 会按如下方式处理 Pig Latin 语句:
首先,Pig 验证所有语句的语法和语义。 接下来,如果 Pig 遇到 DUMP 或 STORE,Pig 将执行这些语句。 在此示例中,Pig 将验证但不执行 LOAD 和 FOREACH 语句。
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
在此示例中,Pig 将验证并执行 LOAD、FOREACH 和 DUMP 语句。
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
DUMP B;
(John)
(Mary)
(Bill)
(Joe)