Pig 中跨字段的值计数

Count of values across fields in Pig

我有以下测试数据。

A   B   C

M   O

M   M   M

M   M   M

N       O

P       N

我想获得此样本测试数据中的条目总数,即 12

我有下面的代码来做同样的事情,但我得到的结果不正确。

任何关于如何纠正的帮助都会有所帮助。

test=  LOAD 'testdata' USING PigStorage(',') as (A:chararray,B:chararray,C:chararray); 
values = FOREACH test GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;  
grp = GROUP values ALL;  
counting = FOREACH grp GENERATE group, COUNT(values.A)+COUNT(values.B)+COUNT(values.C); 

给出的答案是 15,而不是 12。

我还想获得每个值的计数,例如 M=7、N=2、O=2、P=1。 我写了下面的代码。

test=  LOAD 'testdata' USING PigStorage(',') as (A:chararray,B:chararray,C:chararray); 
values = FOREACH test GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;  
grp = GROUP values ALL;  
    A = FOREACH grp {
B =FILTER test.A=='M' OR test.B=='M' OR test.C=='M';
GENERATE group, COUNT(B);
};

我遇到错误 "Scalar has more than one row in the output"。

您还在计算最终的 count.Modify 脚本中的列名,以忽略第一行,然后分组并计数。

test=  LOAD 'testdata' USING PigStorage(',') as (A:chararray,B:chararray,C:chararray); 

ranked = rank test;
test1 = Filter ranked by ([=10=] > 1); --Note:rank_test should work.

values = FOREACH test1 GENERATE A==''?'null':(A is null?'null':A)) as A,(B==''?'null':(B is null?'null':B)) as B,(C==''?'null':(C is null?'null':C)) as C;  
grp = GROUP values ALL;  
counting = FOREACH grp GENERATE group, COUNT(values.A)+COUNT(values.B)+COUNT(values.C);