如何在应用 JOIN 的两个不同表上同时使用 MAX 和 COUNT 函数?

How to use MAX and COUNT function simultaneously on two different tables which are applied with a JOIN?

//Pig Program

User = LOAD 'path' USING PigStorage(',') as (id:int, reputation:int, displayname:chararray, loc:chararray, age:int);

Post = LOAD 'path' USING PigStorage(',') as (id:int, post_type:int, creationdate:chararray, score:int, viewcount:int, ownerus)er_id:int, title:chararray, answercount:chararray, commentcount:chararray);

JOIN User BY id, Post BY id;

a = JOIN User BY id, Post BY id;

DUMP a;

User_Group = Group a ALL;

Max_reputation = foreach User_Group Generate(User.displayname, User.reputation, Post.id), MAX(User.reputation), COUNT(Post.id);

所以基本上我将两个不同的表分组,即 User 和 Post 然后对其应用 JOIN。

问题陈述:查找具有最高声誉的用户的显示名称和帖子数。

所以基本上我需要来自用户的显示名称和声誉

还有来自 Post

的 id

我想在 JOIN 上应用 MAX(User.reputation) 和 Count(Post.id),即 a

请帮忙。

更有用的是应用 JOIN 然后执行 MAX 和 Count 或 应用 MAX 和 Count,然后进行 JOIN。

问题陈述:查找具有最高声誉的 post 用户的显示名称和编号。

首先尝试在关系 "user"

的帮助下找到具有最高声誉的用户的显示名称

然后应用与关系 "post" 的连接以收集该最大用户的所有 post。然后根据 id 应用分组并计算计数。

以下代码将帮助您实现目标

User = LOAD 'path' USING PigStorage(',') as (id:int, reputation:int, displayname:chararray, loc:chararray, age:int);

Post = LOAD 'path' USING PigStorage(',') as (id:int, post_type:int, creationdate:chararray,score:int, viewcount:int, ownerus)er_id:int, title:chararray, answercount:chararray);

User_grp = GROUP User BY id;

User_each = FOREACH User_grp 
                 {
                   User_order = ORDER User BY reputation DESC;
                   User_limit = LIMIT User_order 1;
                   User_nested = FOREACH User_limit GENERATE id,displayname;
                   GENERATE flatten(user_nested) as  (user_id,displayname);
                 };
User_join = JOIN User_each by user_id, Post by id; 

User_grouping = GROUP User_join BY user_id;

User_output  = FOREACH User_grouping GENERATE group as user_id, MAX(user_join.displayname) as displayname, COUNT(user_join.post_type) as post_cnts;