使用 PIG 对多列求和
Sum multiple columns using PIG
我有多个包含相同列的文件,我正在尝试使用 SUM 聚合两列中的值。
栏目结构如下
ID first_count second_count name desc
1 10 10 A A_Desc
1 25 45 A A_Desc
1 30 25 A A_Desc
2 20 20 B B_Desc
2 40 10 B B_Desc
如何求和 first_count 和 second_count?
ID first_count second_count name desc
1 65 80 A A_Desc
2 60 30 B B_Desc
下面是我写的脚本,但是当我执行它时出现错误“无法推断 SUM 的匹配函数,因为其中 none 的倍数 fit.Please 使用显式转换。
A = LOAD '/output/*/part*' AS (id:chararray,first_count:chararray,second_count:chararray,name:chararray,desc:chararray);
B = GROUP A BY id;
C = FOREACH B GENERATE group as id,
SUM(A.first_count) as first_count,
SUM(A.second_count) as second_count,
A.name as name,
A.desc as desc;
你的加载语句是错误的。 first_count、second_count 作为字符数组加载。 Sum 不能将两个字符串相加。如果您确定这些列只会采用数字,则将它们作为 int 加载。试试这个-
A = LOAD '/output/*/part*' AS (id:chararray,first_count:int,second_count:int,name:chararray,desc:chararray);
应该可以。
我有多个包含相同列的文件,我正在尝试使用 SUM 聚合两列中的值。
栏目结构如下
ID first_count second_count name desc
1 10 10 A A_Desc
1 25 45 A A_Desc
1 30 25 A A_Desc
2 20 20 B B_Desc
2 40 10 B B_Desc
如何求和 first_count 和 second_count?
ID first_count second_count name desc
1 65 80 A A_Desc
2 60 30 B B_Desc
下面是我写的脚本,但是当我执行它时出现错误“无法推断 SUM 的匹配函数,因为其中 none 的倍数 fit.Please 使用显式转换。
A = LOAD '/output/*/part*' AS (id:chararray,first_count:chararray,second_count:chararray,name:chararray,desc:chararray);
B = GROUP A BY id;
C = FOREACH B GENERATE group as id,
SUM(A.first_count) as first_count,
SUM(A.second_count) as second_count,
A.name as name,
A.desc as desc;
你的加载语句是错误的。 first_count、second_count 作为字符数组加载。 Sum 不能将两个字符串相加。如果您确定这些列只会采用数字,则将它们作为 int 加载。试试这个-
A = LOAD '/output/*/part*' AS (id:chararray,first_count:int,second_count:int,name:chararray,desc:chararray);
应该可以。