PIG - 如何按具有多个条目的字段分组

PIG - how to group by field, which has multiple entries

我希望能够在这里按小时分组,我知道我将提交多个小时条目。例如下面的第 11 个小时会出现多次。我该怎么做?

hour,windSpeed
11, 3.6
2 , 6.8
11, 2.5
13, 5.0
14, 8.9
11, 3.2

所以我有这个,我只想按小时分组

例如 我们想要 {11: 3.6, 2.5, 3.2 }

和 remanings 因为只有一个值会归入它自己的值

{14: 8.9}

{2: 6.8}

answer = FOREACH weather_data GENERATE [=11=] AS hour,  as speed

按小时分组

A = FOREACH weather_data GENERATE [=10=] AS hour,  as speed;
B = GROUP A by hour;
DUMP B;

如果你想聚合然后使用 sum

C = FOREACH B generate group as hour,SUM(A.speed) as Total;
DUMP C;

试试这个。

A = LOAD 'data' AS (Hour:chararray, windSpeed:chararray);
B = GROUP A BY (Hour);
C = FOREACH B GENERATE
FLATTEN(group) AS (Hour), A.windSpeed
;

注意:这是未经测试的代码