Apache Pig GROUP BY,ORDER BY
Apache Pig GROUP BY ,ORDER BY
我有一个包含 playerName,gameName,score 元组的包。
我首先在 Bag BY 游戏上方进行 GROUP,然后将其放入另一个 bag.Now 我希望在另一个包中每场比赛得分最高的元组。我应该怎么做?
输入:
jon,mario,2345
joe,minesweeper,234
peter,mario,112
lisa,minesweeper,900
猪脚本:
game_data = LOAD 'game_data.csv' USING PigStorage(',') AS (player:chararray, game:chararray, score:long);
game_data_grp_by_game = GROUP game_data BY game;
game_kpis = FOREACH game_data_grp_by_game {
ord_game_data_by_score = ORDER game_data BY score DESC;
max_score_record = LIMIT ord_game_data_by_score 1;
GENERATE group AS game, FLATTEN(max_score_record.player) AS player_name, FLATTEN(max_score_record.score) AS score;
};
输出:DUMP game_kpis:
(mario,jon,2345)
(minesweeper,lisa,900)
我有一个包含 playerName,gameName,score 元组的包。 我首先在 Bag BY 游戏上方进行 GROUP,然后将其放入另一个 bag.Now 我希望在另一个包中每场比赛得分最高的元组。我应该怎么做?
输入:
jon,mario,2345
joe,minesweeper,234
peter,mario,112
lisa,minesweeper,900
猪脚本:
game_data = LOAD 'game_data.csv' USING PigStorage(',') AS (player:chararray, game:chararray, score:long);
game_data_grp_by_game = GROUP game_data BY game;
game_kpis = FOREACH game_data_grp_by_game {
ord_game_data_by_score = ORDER game_data BY score DESC;
max_score_record = LIMIT ord_game_data_by_score 1;
GENERATE group AS game, FLATTEN(max_score_record.player) AS player_name, FLATTEN(max_score_record.score) AS score;
};
输出:DUMP game_kpis:
(mario,jon,2345)
(minesweeper,lisa,900)