猪在分组
Pig on grouping
我试着计算每个 member_id 出现的次数。
数据看起来像:(member_id, item_type)
2020292 美国广播公司
2020292 Acd
2020292 美国广播公司
2938201 CDE
那么输出将类似于 (id, count):
2020292 3
2938201 1
我尝试了以下方法:
data=FOREACH data GENERATE member_id, item_type;
grouping=group data by member_id;
count_elements=foreach grouping generate flatten(group) as member_id, COUNT(data) as num_elements;
我也为 count_elements 尝试过类似的代码,例如 'foreach grouping generate member_id, COUNT(data) as num_elements;'
和 'foreach grouping generate flatten(group) as member_id, COUNT(data.item_type) as num_elements;' 并且没有一个在工作。
任何帮助是极大的赞赏。
谢谢。
输入:
2020292,Abc
2020292,Acd
2020292,Abc
2938201,CDE
代码:
read = load 'test.data' using PigStorage(',') as (id:int,item_typ:chararray);
grouped_Data = group read by id;
describe grouped_Data;
count_val = foreach grouped_Data GENERATE group as (member_id:int),COUNT(read) as (rec_cnt:int);
dump count_val;
输出:
(2020292,3)
(2938201,1)
珍妮,我为你的问题添加了代码,也为你在上面的评论中提出的问题添加了代码(@Learner 的回答)。
输入数据:
2020292,Abc
2020292,Acd
2020292,Abc
2938201,CDE
id_list的示例数据:
2020292
2020291
2020290
猪脚本:
data = LOAD '/pigsamples/groupdata' USING PigStorage(',')
AS (member_id:INT, item_type:CHARARRAY);
id_list_data = LOAD '/pigsamples/groupidlist' USING PigStorage(',') AS (member_id:INT);
group_data = GROUP data BY member_id;
count_grouped_data = FOREACH group_data GENERATE group AS member_id, COUNT(data) AS count;
join_data = JOIN count_grouped_data BY member_id, id_list_data BY member_id;
group_joined_data = FOREACH join_data GENERATE count_grouped_data::member_id
AS id, count_grouped_data::count AS count_item_type;
输出:
(2020292,3)
我试着计算每个 member_id 出现的次数。 数据看起来像:(member_id, item_type)
2020292 美国广播公司
2020292 Acd
2020292 美国广播公司
2938201 CDE
那么输出将类似于 (id, count):
2020292 3
2938201 1
我尝试了以下方法:
data=FOREACH data GENERATE member_id, item_type;
grouping=group data by member_id;
count_elements=foreach grouping generate flatten(group) as member_id, COUNT(data) as num_elements;
我也为 count_elements 尝试过类似的代码,例如 'foreach grouping generate member_id, COUNT(data) as num_elements;' 和 'foreach grouping generate flatten(group) as member_id, COUNT(data.item_type) as num_elements;' 并且没有一个在工作。 任何帮助是极大的赞赏。 谢谢。
输入:
2020292,Abc
2020292,Acd
2020292,Abc
2938201,CDE
代码:
read = load 'test.data' using PigStorage(',') as (id:int,item_typ:chararray);
grouped_Data = group read by id;
describe grouped_Data;
count_val = foreach grouped_Data GENERATE group as (member_id:int),COUNT(read) as (rec_cnt:int);
dump count_val;
输出:
(2020292,3)
(2938201,1)
珍妮,我为你的问题添加了代码,也为你在上面的评论中提出的问题添加了代码(@Learner 的回答)。
输入数据:
2020292,Abc
2020292,Acd
2020292,Abc
2938201,CDE
id_list的示例数据:
2020292
2020291
2020290
猪脚本:
data = LOAD '/pigsamples/groupdata' USING PigStorage(',')
AS (member_id:INT, item_type:CHARARRAY);
id_list_data = LOAD '/pigsamples/groupidlist' USING PigStorage(',') AS (member_id:INT);
group_data = GROUP data BY member_id;
count_grouped_data = FOREACH group_data GENERATE group AS member_id, COUNT(data) AS count;
join_data = JOIN count_grouped_data BY member_id, id_list_data BY member_id;
group_joined_data = FOREACH join_data GENERATE count_grouped_data::member_id
AS id, count_grouped_data::count AS count_item_type;
输出:
(2020292,3)