查找年份和编号列表。每年使用猪发生的事件
finding list of years and no. of events occurred each year using pig
数据集详细信息为:
id,event,year,rating,duration
1,f1,1980,3.4,4200
2,f2,1960,4.2,7273
3,f3,1980,2.1,2721
4,f4,1960,3.5,7212
5,f5,1960,2.1,7786
你怎么能找到年份和编号的列表。每年发生的事件?
我已经试过了,但我没有用它显示架构错误
events = load 'event' using pigstorage ',' as (id:int, event:chararray, year:int, rating:float, duration:int);
list_of_years = group events by year;
no_of_events = foreach list_of_years generate count(moviename);
dump no._of_events;
答案如下:
首先,您的加载语句不正确:
events = load 'event' using pigstorage ',' as (id:int, event:chararray, year:int , rating:float, duration:int); -- 不正确
PigStorage是一个函数,正确的写法是PigStorage(',')
现在解决你的问题,
输入
1,f1,1980,3.4,4200
2,f2,1960,4.2,7273
3,f3,1980,2.1,2721
4,f4,1960,3.5,7212
5,f5,1960,2.1,7786
猪文
//使用正确的语法和分隔符加载数据。
events = load 'stack_case001.txt' using PigStorage(',') as (id:int, event:chararray, year:int, rating:float, duration:int) ;
//按年份分组数据
list_of_years = 按年份分组事件;
//通过迭代分组数据,统计分组实体对应的item数量,统计每年发生的事件数
number_of_events_per_year = FOREACH list_of_years 生成组,COUNT($1);
//在屏幕上打印输出。
转储number_of_events_per_year;
输出
(1960,3)
(1980,2)
希望对您有所帮助。
数据集详细信息为:
id,event,year,rating,duration
1,f1,1980,3.4,4200
2,f2,1960,4.2,7273
3,f3,1980,2.1,2721
4,f4,1960,3.5,7212
5,f5,1960,2.1,7786
你怎么能找到年份和编号的列表。每年发生的事件?
我已经试过了,但我没有用它显示架构错误
events = load 'event' using pigstorage ',' as (id:int, event:chararray, year:int, rating:float, duration:int);
list_of_years = group events by year;
no_of_events = foreach list_of_years generate count(moviename);
dump no._of_events;
答案如下:
首先,您的加载语句不正确:
events = load 'event' using pigstorage ',' as (id:int, event:chararray, year:int , rating:float, duration:int); -- 不正确
PigStorage是一个函数,正确的写法是PigStorage(',')
现在解决你的问题,
输入
1,f1,1980,3.4,4200 2,f2,1960,4.2,7273 3,f3,1980,2.1,2721 4,f4,1960,3.5,7212 5,f5,1960,2.1,7786
猪文
//使用正确的语法和分隔符加载数据。
events = load 'stack_case001.txt' using PigStorage(',') as (id:int, event:chararray, year:int, rating:float, duration:int) ;
//按年份分组数据
list_of_years = 按年份分组事件;
//通过迭代分组数据,统计分组实体对应的item数量,统计每年发生的事件数
number_of_events_per_year = FOREACH list_of_years 生成组,COUNT($1);
//在屏幕上打印输出。
转储number_of_events_per_year;
输出
(1960,3)
(1980,2)
希望对您有所帮助。