如何加入 PIG 并发现其价值?
how to join and find value in PIG?
* 合并这两个表并检查 NDAKOTA 地区大于 1600 的 ID*
1 个阿拉斯加罗伯特
2 波士顿百合
3 N达科他迈克尔
4 NDakota 威尔
5 达科他马克
1A 1 2012 年 9 月 9 日 1200
2A 2 2016 年 8 月 9 日 3400
3B 3 2016 年 4 月 5 日 2300
customers = LOAD '/home/vis/Documents/customers' using PigStorage(' ') AS(cust_id:int,region:chararray,name:chararray);
sales = LOAD '/home/vis/Documents/sales' using PigStorage(' ')
AS(sales_id:int,cust_id:int,date:datetime,amount:int);
salesNA = FILTER customers BY region =='NDakota';
joined = JOIN sales BY cust_id,salesNA BY cust_id;
grouped = GROUP joined BY cust_id;
summed= FOREACH grouped GENERATE GROUP,SUM(sales.amount);
bigSpenders= FILTER summed BY 1$>1600;
DUMP sorted;
收到错误
来自 Apache Pig 文档
Use the disambiguate operator ( :: ) to identify field names after
JOIN, COGROUP, CROSS, or FLATTEN operators.
下面的代码片段应该足以实现 objective,如果您发现任何问题,请告诉我。
customers = LOAD 'customers.txt' using PigStorage(' ') AS(cust_id:int,region:chararray,name:chararray);
sales = LOAD 'sales.txt' using PigStorage(' ') AS(sales_id:chararray,cust_id:int,date:chararray,amount:int);
custNA = FILTER customers BY region =='NDakota';
joined = JOIN sales BY cust_id,custNA BY cust_id;
req_data = FILTER joined BY amount > 1600;
DUMP req_data;
* 合并这两个表并检查 NDAKOTA 地区大于 1600 的 ID*
1 个阿拉斯加罗伯特
2 波士顿百合
3 N达科他迈克尔
4 NDakota 威尔
5 达科他马克
1A 1 2012 年 9 月 9 日 1200
2A 2 2016 年 8 月 9 日 3400
3B 3 2016 年 4 月 5 日 2300
customers = LOAD '/home/vis/Documents/customers' using PigStorage(' ') AS(cust_id:int,region:chararray,name:chararray);
sales = LOAD '/home/vis/Documents/sales' using PigStorage(' ')
AS(sales_id:int,cust_id:int,date:datetime,amount:int);
salesNA = FILTER customers BY region =='NDakota';
joined = JOIN sales BY cust_id,salesNA BY cust_id;
grouped = GROUP joined BY cust_id;
summed= FOREACH grouped GENERATE GROUP,SUM(sales.amount);
bigSpenders= FILTER summed BY 1$>1600;
DUMP sorted;
收到错误
来自 Apache Pig 文档
Use the disambiguate operator ( :: ) to identify field names after JOIN, COGROUP, CROSS, or FLATTEN operators.
下面的代码片段应该足以实现 objective,如果您发现任何问题,请告诉我。
customers = LOAD 'customers.txt' using PigStorage(' ') AS(cust_id:int,region:chararray,name:chararray);
sales = LOAD 'sales.txt' using PigStorage(' ') AS(sales_id:chararray,cust_id:int,date:chararray,amount:int);
custNA = FILTER customers BY region =='NDakota';
joined = JOIN sales BY cust_id,custNA BY cust_id;
req_data = FILTER joined BY amount > 1600;
DUMP req_data;