使用日期数据获取周数并在猪中进行一些计算
Get week using date data and do some calculation in pig
我的数据是这样的:
(201601030637,2,64.001213)
(201601030756,3,63.5869656667)
(201601040220,2,62.758471)
其中第一列是年(2016)月(01)日(03)时(06)分(37)相互连接。
我想根据周对第三列的值求和。我怎样才能将他们分组为全年有 52 个不同的组?谁能帮忙?
谢谢!
使用GetWeek and create a new column from the first column.Then group by the new column and use SUM。假设您已经将数据加载到关系A。
B = FOREACH A GENERATE A.[=10=],A.,A.,GetWeek(A.[=10=]) as week_of_year;
C = GROUP B BY (B.);
D = FOREACH C GENERATE group,SUM(B.);
DUMP D;
使用ToDate
将日期字符串转换为日期时间类型。然后使用 GetWeek
获取周数。最后使用 GROUP
按周数和 SUM
.
分组
A = LOAD '/path_to_data/data' USING PigStorage(',') as (c1: chararray, c2: int, c3: float);
B = FOREACH A GENERATE GetWeek(ToDate(c1,'yyyyMMddHHmm')) as weeknum, c1, c2, c3;
C = FOREACH (GROUP B BY weeknum) GENERATE group as weeknum, SUM(B.c2) as c2_sum;
DUMP C;
我的数据是这样的:
(201601030637,2,64.001213)
(201601030756,3,63.5869656667)
(201601040220,2,62.758471)
其中第一列是年(2016)月(01)日(03)时(06)分(37)相互连接。
我想根据周对第三列的值求和。我怎样才能将他们分组为全年有 52 个不同的组?谁能帮忙? 谢谢!
使用GetWeek and create a new column from the first column.Then group by the new column and use SUM。假设您已经将数据加载到关系A。
B = FOREACH A GENERATE A.[=10=],A.,A.,GetWeek(A.[=10=]) as week_of_year;
C = GROUP B BY (B.);
D = FOREACH C GENERATE group,SUM(B.);
DUMP D;
使用ToDate
将日期字符串转换为日期时间类型。然后使用 GetWeek
获取周数。最后使用 GROUP
按周数和 SUM
.
A = LOAD '/path_to_data/data' USING PigStorage(',') as (c1: chararray, c2: int, c3: float);
B = FOREACH A GENERATE GetWeek(ToDate(c1,'yyyyMMddHHmm')) as weeknum, c1, c2, c3;
C = FOREACH (GROUP B BY weeknum) GENERATE group as weeknum, SUM(B.c2) as c2_sum;
DUMP C;