如何执行猪文件

Question

我有一个简单的 csv 文件

当我尝试用这种方式运行一些代码时。

grunt> SET job.name 'this_and_that';
grunt> SET mapreduce.job.queuename adhoc;
grunt> SET default_parallel 50;
grunt> index_row = load 'nmbr.csv' as (number:int);
grunt> dump index_row;

我得到了正确的结果。

(1)
(2)
(3)
(4)

但是当我将代码保存在文件中时 test.pig

SET job.name 'this_and_that';
SET mapreduce.job.queuename adhoc;
SET default_parallel 50;
index_row = load 'nmbr.csv' as (number:int);
dump index_row;

并尝试运行这样。

$ pig -x mapreduce hdfs://nameservice1/user/evkuzmin/test.pig

我收到这条消息。

17/01/11 16:14:14 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
17/01/11 16:14:14 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
17/01/11 16:14:14 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2017-01-11 16:14:14,306 [main] INFO  org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-1245 (rexported) compiled Aug 26 2016, 02:07:35
2017-01-11 16:14:14,307 [main] INFO  org.apache.pig.Main - Logging error messages to: /export/home/evkuzmin/pig_1484140454299.log
2017-01-11 16:14:20,083 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /export/home/evkuzmin/.pigbootup not found
2017-01-11 16:14:20,301 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://nameservice1
2017-01-11 16:14:20,401 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-test.pig-b92d8d10-6d6c-4018-b55c-da85716c482b
2017-01-11 16:14:21,549 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hd-has011.vimpelcom.ru:8188/ws/v1/timeline/
2017-01-11 16:14:21,571 [main] INFO  org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2017-01-11 16:14:26,403 [main] INFO  org.apache.pig.Main - Pig script completed in 12 seconds and 711 milliseconds (12711 ms)

我试着在这里查找错误，

/export/home/evkuzmin/pig_1484140454299.log

但是文件不存在。

Answer 1

不要将您的 test.pig 放在 hdfs 位置。

改为在本地 test.pig 加载位置进行更改：

SET job.name 'this_and_that';
SET mapreduce.job.queuename adhoc;
SET default_parallel 50;
index_row = load 'hdfs://nameservice1/user/evkuzmin/nmbr.csv' as (number:int);
dump index_row;

然后运行您的 test.pig 文件仅来自本地但处于 MAPREDUCE 模式：

pig -x MAPREDUCE your/local/path/to/test.pig

如何执行猪文件

How to execute a pig file

hadoop

apache-pig