PIG 中行的单列值
Single Column Value from Row in PIG
我有这样的数据:
1, 0, 0
0, 1, 0
0, 0, 1
需要输出为:
1, 1, 1
如何在 Pig 中执行此操作?
输入
1, 0, 0
0, 1, 0
0, 0, 1
只需在每一行中创建一个具有相同值的新变量并使用该键应用分组并为每个变量取 MAX ..
records = LOAD '/user/cloudera/records.txt' USING PigStorage(',') AS (c1:int,c2:int,c3:int);
records_each = FOREACH records GENERATE 'KEY' as grouping_key, c1, c2, c3;
records_grp = GROUP records_each BY grouping_key;
records_grp_each = FOREACH records_grp GENERATE MAX(records_each.c1) as c1, MAX(records_each.c2) as c2, MAX(records_each.c3) as c3;
输出:
(1,1,1)
我有这样的数据:
1, 0, 0
0, 1, 0
0, 0, 1
需要输出为:
1, 1, 1
如何在 Pig 中执行此操作?
输入
1, 0, 0
0, 1, 0
0, 0, 1
只需在每一行中创建一个具有相同值的新变量并使用该键应用分组并为每个变量取 MAX ..
records = LOAD '/user/cloudera/records.txt' USING PigStorage(',') AS (c1:int,c2:int,c3:int);
records_each = FOREACH records GENERATE 'KEY' as grouping_key, c1, c2, c3;
records_grp = GROUP records_each BY grouping_key;
records_grp_each = FOREACH records_grp GENERATE MAX(records_each.c1) as c1, MAX(records_each.c2) as c2, MAX(records_each.c3) as c3;
输出:
(1,1,1)