如何执行 Group by 然后在 pig 的其他列上使用 DISTINCT
How to perform Group by then use DISTINCT on other column in pig
我刚刚开始学习 PIG,需要一些帮助解决以下问题。提前致谢!
例如:我有这样的输入:
职业类别名称
Actress Acting Marion Cotillard
Actor Acting Liam Nelson
Tennis Plyr Athletics Roger Federer
Football Plyr Athletics Neymar
Actor Acting Tom Hanks
Actress Acting Elizabeth Banks
US Senator Politics Elizabeth Warren
Football Plyr Athletics Mesut Ozil
我想知道一个类目有多少种。
例如:- 表演有两种类型,一种是女演员,另一种是演员。因此,结果将为 2。
面临的问题:无法使用 'Occupation' 列区分 'group by Category' 的输出。 :(
Distinct first and Group By Category.Assuming 你已经将数据加载到关系 A.
Select 加载后的 2 列。
区分关系
按类别分组
计算每个类别的职业
B = FOREACH A GENERATE Occupation as Occupation,Category as Category;
C = DISTINCT B;
D = GROUP C BY ;
E = FOREACH D GENERATE group,COUNT(C.Occupation);
DUMP E;
试试这个:
x= load '<data>' using PigStorage('\t') as (occupation:chararray,category:chararray,name:chararray);
x_grouped= group x by category;
x_grouped_distinct= foreach x_grouped { cat= distinct .occupation; generate [=10=], cat, COUNT(cat);};
dump x_grouped_distinct;
我刚刚开始学习 PIG,需要一些帮助解决以下问题。提前致谢!
例如:我有这样的输入:
职业类别名称
Actress Acting Marion Cotillard
Actor Acting Liam Nelson
Tennis Plyr Athletics Roger Federer
Football Plyr Athletics Neymar
Actor Acting Tom Hanks
Actress Acting Elizabeth Banks
US Senator Politics Elizabeth Warren
Football Plyr Athletics Mesut Ozil
我想知道一个类目有多少种。 例如:- 表演有两种类型,一种是女演员,另一种是演员。因此,结果将为 2。 面临的问题:无法使用 'Occupation' 列区分 'group by Category' 的输出。 :(
Distinct first and Group By Category.Assuming 你已经将数据加载到关系 A.
Select 加载后的 2 列。
区分关系
按类别分组
计算每个类别的职业
B = FOREACH A GENERATE Occupation as Occupation,Category as Category;
C = DISTINCT B;
D = GROUP C BY ;
E = FOREACH D GENERATE group,COUNT(C.Occupation);
DUMP E;
试试这个:
x= load '<data>' using PigStorage('\t') as (occupation:chararray,category:chararray,name:chararray);
x_grouped= group x by category;
x_grouped_distinct= foreach x_grouped { cat= distinct .occupation; generate [=10=], cat, COUNT(cat);};
dump x_grouped_distinct;