一袋猪文总和
Pig script sum within a bag
总结每个 birthCity/birthState 组合的双倍数和三倍数。输出前 5 个 birthCity/birthState 组合产生了最多双打和三打的球员。
目前我有这个
clean = FOREACH filtered_2 GENERATE id,city,state, dble + tripple AS combined;
dump clean;
我的问题是如何满足以上条件?很明显我必须按(城市,州)分组。如果我按
分组,我如何获得包内的总和
counter = foreach clean {
sum1 = SUM(combined);
generate id,city,state,sum1;
};
我在想这样的事情,但它不起作用
按城市、州对关系进行清理,然后使用 SUM 获得每个城市、州的分组总数。
clean = FOREACH filtered_2 GENERATE id,city,state,(dble + tripple) AS combined;
clean_group = GROUP clean BY (city,state);
counter = FOREACH clean_group GENERATE FLATTEN(group) as (city,state),SUM(clean.combined) as sum1;
总结每个 birthCity/birthState 组合的双倍数和三倍数。输出前 5 个 birthCity/birthState 组合产生了最多双打和三打的球员。
目前我有这个
clean = FOREACH filtered_2 GENERATE id,city,state, dble + tripple AS combined;
dump clean;
我的问题是如何满足以上条件?很明显我必须按(城市,州)分组。如果我按
分组,我如何获得包内的总和 counter = foreach clean {
sum1 = SUM(combined);
generate id,city,state,sum1;
};
我在想这样的事情,但它不起作用
按城市、州对关系进行清理,然后使用 SUM 获得每个城市、州的分组总数。
clean = FOREACH filtered_2 GENERATE id,city,state,(dble + tripple) AS combined;
clean_group = GROUP clean BY (city,state);
counter = FOREACH clean_group GENERATE FLATTEN(group) as (city,state),SUM(clean.combined) as sum1;