为组包中的每一列生成 MAX
Generate MAX for every column in a group bag
我有这些关系:
a01x = FOREACH a01 GENERATE ndggr, 1 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a02x = FOREACH a02 GENERATE ndggr, 0 AS c1, 1 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a03x = FOREACH a03 GENERATE ndggr, 0 AS c1, 0 AS c2, 1 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a04x = FOREACH a04 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 1 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a05x = FOREACH a05 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 1 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a06x = FOREACH a06 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 1 AS c6, 0 AS c7, 0 AS c8;
a07x = FOREACH a07 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 1 AS c7, 0 AS c8;
a08x = FOREACH a08 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 1 AS c8;
aunion = UNION a01x, a02x, a03x, a04x, a05x, a06x, a07x, a08x;
agroups = GROUP aunion BY ndggr;
而且,考虑到 ndggr 是我的关键,我想获得一个关系,其中每个元组都像
001, 1, 0, 0, 0, 1, 1, 0, 1
155, 0, 0, 0, 1, 1, 0, 1, 1
200, 1, 0, 0, 0, 0, 0, 0, 1
所以对于每个组我都想要类似的东西
ndggr, MAX(c1), MAX(c2), ... , MAX(c8)
我怎样才能得到这个?
考虑阅读一些基本的 Apache Pig 文档。
maxes = foreach agroups {
generate
group as ndggr,
MAX(agroups.c1) as c1_max,
...
MAX(agroups.c8) as c8_max
;
};
我有这些关系:
a01x = FOREACH a01 GENERATE ndggr, 1 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a02x = FOREACH a02 GENERATE ndggr, 0 AS c1, 1 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a03x = FOREACH a03 GENERATE ndggr, 0 AS c1, 0 AS c2, 1 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a04x = FOREACH a04 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 1 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a05x = FOREACH a05 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 1 AS c5, 0 AS c6, 0 AS c7, 0 AS c8;
a06x = FOREACH a06 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 1 AS c6, 0 AS c7, 0 AS c8;
a07x = FOREACH a07 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 1 AS c7, 0 AS c8;
a08x = FOREACH a08 GENERATE ndggr, 0 AS c1, 0 AS c2, 0 AS c3, 0 AS c4, 0 AS c5, 0 AS c6, 0 AS c7, 1 AS c8;
aunion = UNION a01x, a02x, a03x, a04x, a05x, a06x, a07x, a08x;
agroups = GROUP aunion BY ndggr;
而且,考虑到 ndggr 是我的关键,我想获得一个关系,其中每个元组都像
001, 1, 0, 0, 0, 1, 1, 0, 1
155, 0, 0, 0, 1, 1, 0, 1, 1
200, 1, 0, 0, 0, 0, 0, 0, 1
所以对于每个组我都想要类似的东西
ndggr, MAX(c1), MAX(c2), ... , MAX(c8)
我怎样才能得到这个?
考虑阅读一些基本的 Apache Pig 文档。
maxes = foreach agroups {
generate
group as ndggr,
MAX(agroups.c1) as c1_max,
...
MAX(agroups.c8) as c8_max
;
};