查找年份和编号列表。每年使用猪发生的事件

finding list of years and no. of events occurred each year using pig

数据集详细信息为:

id,event,year,rating,duration

1,f1,1980,3.4,4200

2,f2,1960,4.2,7273

3,f3,1980,2.1,2721

4,f4,1960,3.5,7212

5,f5,1960,2.1,7786

你怎么能找到年份和编号的列表。每年发生的事件?

我已经试过了,但我没有用它显示架构错误

events = load 'event' using pigstorage ',' as (id:int, event:chararray, year:int, rating:float, duration:int);

list_of_years = group events by year;

no_of_events = foreach list_of_years generate count(moviename);

dump no._of_events;

答案如下:

首先,您的加载语句不正确:

events = load 'event' using pigstorage ',' as (id:int, event:chararray, year:int , rating:float, duration:int); -- 不正确

PigStorage是一个函数,正确的写法是PigStorage(',')

现在解决你的问题,

输入

1,f1,1980,3.4,4200 2,f2,1960,4.2,7273 3,f3,1980,2.1,2721 4,f4,1960,3.5,7212 5,f5,1960,2.1,7786

猪文

//使用正确的语法和分隔符加载数据。

events = load 'stack_case001.txt' using PigStorage(',') as (id:int, event:chararray, year:int, rating:float, duration:int) ;

//按年份分组数据

list_of_years = 按年份分组事件;

//通过迭代分组数据,统计分组实体对应的item数量,统计每年发生的事件数

number_of_events_per_year = FOREACH list_of_years 生成组,COUNT($1);

//在屏幕上打印输出。

转储number_of_events_per_year;

输出

(1960,3)

(1980,2)

希望对您有所帮助。