COUNT 工作时 Pig 中的 AVG、MIN、MAX 错误
AVG, MIN, MAX error in Pig while COUNT works
我正在尝试在 Pig 中使用 AVG、MIN、MAX。 MIN 和 MAX 函数在执行时都卡住了,AVG 函数抛出错误。但是 COUNT 函数工作正常。
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (GRADE 2 TEACHER,{(65587.90)}), 2nd :(GRADE 4 TEACHER,{(56567.24)})
我的代码:
register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary:float, REPLACE(TA,',','') as TA:float, type, org, year;
C = filter B by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E genetate group,AVG(D.salary);
minim = foreach E genetate group,MIN(D.salary);
maxim = foreach E genetate group,MAX(D.salary);
样本数据
(ABBOTT,DEEDEE W,GRADES 9-12 TEACHER,52,122.10,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABBOTT,RYAN V,GRADE 4 TEACHER,56,567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABBOUD,CLAUDIA MORA,GRADES K-5 TEACHER,63,957.50,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDUL-JABBAR,KHADEEJA ,GRADES 9-12 TEACHER,16,791.73,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDUL-RAZACQ,SALAHUD-DIN ,INSTRUCTIONAL SPECIALIST P-8,45,832.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,DIANA ,SPECIAL ED PARAPRO/AIDE,10,934.94,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,NADIYAH W,GRADES 6-8 TEACHER,75,109.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,RHONDALYN Y,SPECIAL ED PARAPRO/AIDE,28,649.34,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(OSBORNE,CHRISTINE L,INSTRUCTIONAL SUPERVISOR,78,875.59,3,265.71,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
(OSBORNE,DORIS A,OCCUPATIONAL THERAPIST ,65,421.79,1,156.05,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
第 7 行 GROUP 操作后的示例数据。
(GRADE 2 TEACHER,{(OSBORNE,VIRGINIA E,GRADE 2 TEACHER,65587.90,0,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)})
(GRADE 4 TEACHER,{(ABBOTT,RYAN V,GRADE 4 TEACHER,56567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)})
(MAINTENANCE PERSONNEL,{(BROOKS,RICHARD M,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SUMNER,ROBERT O,MAINTENANCE PERSONNEL,72655.53,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MCCULLOUGH,ALVIN J,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(DALTON,JAMES E,MAINTENANCE PERSONNEL,72655.52,2124.60,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SMITH,KEVIN W,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MANGHAM,LARRY G,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010)})
这是 Pig 中的错误吗?请帮助我。
这是更新后的 Pig 脚本。
register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary, REPLACE(TA,',','') as TA, type, org, year;
B1 = foreach B generate name, job, (double)salary, (double)TA, type, org, year;
C = filter B1 by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E generate group,AVG(D.salary);
minim = foreach E generate group,MIN(D.salary);
maxim = foreach E generate group,MAX(D.salary);
问题是,您需要为 salary
和 TA
属性提供显式转换。
我正在尝试在 Pig 中使用 AVG、MIN、MAX。 MIN 和 MAX 函数在执行时都卡住了,AVG 函数抛出错误。但是 COUNT 函数工作正常。
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (GRADE 2 TEACHER,{(65587.90)}), 2nd :(GRADE 4 TEACHER,{(56567.24)})
我的代码:
register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary:float, REPLACE(TA,',','') as TA:float, type, org, year;
C = filter B by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E genetate group,AVG(D.salary);
minim = foreach E genetate group,MIN(D.salary);
maxim = foreach E genetate group,MAX(D.salary);
样本数据
(ABBOTT,DEEDEE W,GRADES 9-12 TEACHER,52,122.10,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABBOTT,RYAN V,GRADE 4 TEACHER,56,567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABBOUD,CLAUDIA MORA,GRADES K-5 TEACHER,63,957.50,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDUL-JABBAR,KHADEEJA ,GRADES 9-12 TEACHER,16,791.73,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDUL-RAZACQ,SALAHUD-DIN ,INSTRUCTIONAL SPECIALIST P-8,45,832.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,DIANA ,SPECIAL ED PARAPRO/AIDE,10,934.94,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,NADIYAH W,GRADES 6-8 TEACHER,75,109.92,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(ABDULLAH,RHONDALYN Y,SPECIAL ED PARAPRO/AIDE,28,649.34,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)
(OSBORNE,CHRISTINE L,INSTRUCTIONAL SUPERVISOR,78,875.59,3,265.71,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
(OSBORNE,DORIS A,OCCUPATIONAL THERAPIST ,65,421.79,1,156.05,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)
第 7 行 GROUP 操作后的示例数据。
(GRADE 2 TEACHER,{(OSBORNE,VIRGINIA E,GRADE 2 TEACHER,65587.90,0,LBOE,COBB COUNTY SCHOOL DISTRICT,2010)})
(GRADE 4 TEACHER,{(ABBOTT,RYAN V,GRADE 4 TEACHER,56567.24,0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010)})
(MAINTENANCE PERSONNEL,{(BROOKS,RICHARD M,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SUMNER,ROBERT O,MAINTENANCE PERSONNEL,72655.53,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MCCULLOUGH,ALVIN J,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(DALTON,JAMES E,MAINTENANCE PERSONNEL,72655.52,2124.60,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(SMITH,KEVIN W,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010),(MANGHAM,LARRY G,MAINTENANCE PERSONNEL,72655.52,0,LBOE,FULTON COUNTY BOARD OF EDUCATION,2010)})
这是 Pig 中的错误吗?请帮助我。
这是更新后的 Pig 脚本。
register 'pig/contrib/piggybank/java/piggybank.jar';
define Replace org.apache.pig.piggybank.evaluation.string.REPLACE();
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:chararray,TA:chararray,type:chararray,org:chararray,year:int);
B = foreach A generate name,job,REPLACE(salary,',','') as salary, REPLACE(TA,',','') as TA, type, org, year;
B1 = foreach B generate name, job, (double)salary, (double)TA, type, org, year;
C = filter B1 by type=='LBOE';
D = filter C by year==2010;
E = group D by job;
number = foreach E generate group,COUNT(D.salary);
average = foreach E generate group,AVG(D.salary);
minim = foreach E generate group,MIN(D.salary);
maxim = foreach E generate group,MAX(D.salary);
问题是,您需要为 salary
和 TA
属性提供显式转换。