Pig中跨列的字段总和
Sum of fields across column in Pig
我有以下测试数据。
A B C
M O
M M M
M M M
N O
P N
我还想得到每个值的计数,例如 M=7、N=2、O=2、P=1。 ,其中 A、B 和 C 是列标题。我写了下面的代码。
test= LOAD 'testdata' USING PigStorage(',') as(A:chararray,B:chararray,C:chararray);
values = FOREACH test GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;
grp = GROUP values ALL;
A = FOREACH grp {
B =FILTER test.A=='M' OR test.B=='M' OR test.C=='M';
C =FILTER test.A=='N' OR test.B=='N' OR test.C=='N';
D =FILTER test.A=='O' OR test.B=='O' OR test.C=='O';
E =FILTER test.A=='P' OR test.B=='P' OR test.C=='P';
GENERATE group, COUNT(B), COUNT(C),COUNT(D),COUNT(E);
};
我遇到错误 "Scalar has more than one row in the output"。
任何输入都会有所帮助!!
将数据加载为一行,标记字段然后计数
A = load 'testdata' as (line:chararray);
B = foreach A generate flatten(TOKENIZE((chararray)line)) as word;
C = group B by word;
D = foreach C generate group,COUNT(B);
DUMP D;
我有以下测试数据。
A B C
M O
M M M
M M M
N O
P N
我还想得到每个值的计数,例如 M=7、N=2、O=2、P=1。 ,其中 A、B 和 C 是列标题。我写了下面的代码。
test= LOAD 'testdata' USING PigStorage(',') as(A:chararray,B:chararray,C:chararray);
values = FOREACH test GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;
grp = GROUP values ALL;
A = FOREACH grp {
B =FILTER test.A=='M' OR test.B=='M' OR test.C=='M';
C =FILTER test.A=='N' OR test.B=='N' OR test.C=='N';
D =FILTER test.A=='O' OR test.B=='O' OR test.C=='O';
E =FILTER test.A=='P' OR test.B=='P' OR test.C=='P';
GENERATE group, COUNT(B), COUNT(C),COUNT(D),COUNT(E);
};
我遇到错误 "Scalar has more than one row in the output"。 任何输入都会有所帮助!!
将数据加载为一行,标记字段然后计数
A = load 'testdata' as (line:chararray);
B = foreach A generate flatten(TOKENIZE((chararray)line)) as word;
C = group B by word;
D = foreach C generate group,COUNT(B);
DUMP D;