在 Pig Latin 中加载 csv 文件
LOAD csv file in PigLatin
我正在尝试在 PigLatin 中加载一个 csv 文件。记录格式如下:
"ABBOTT,DEEDEE W",GRADES 9-12 TEACHER,"52,122.10",0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010
我尝试了以下代码:
A = LOAD '/user/hduser/salaryTravel.csv' using PigStorage(',') AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
但是输出如下:
("ABBOTT,DEEDEE W",,,122.10",0,)
name
字段被读取为单独的字段,因为名称字段包含逗号 (',')。我怎样才能读到这个记录?
建议使用 CSVExcelStorage 或 CSVLoader API 加载数据。
REGISTER piggybank.jar;
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage() AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
或
REGISTER piggybank.jar;
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
参考:REGEX_EXTRACT error in PIG,分享了一些代码示例。
我正在尝试在 PigLatin 中加载一个 csv 文件。记录格式如下:
"ABBOTT,DEEDEE W",GRADES 9-12 TEACHER,"52,122.10",0,LBOE,ATLANTA INDEPENDENT SCHOOL SYSTEM,2010
我尝试了以下代码:
A = LOAD '/user/hduser/salaryTravel.csv' using PigStorage(',') AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
但是输出如下:
("ABBOTT,DEEDEE W",,,122.10",0,)
name
字段被读取为单独的字段,因为名称字段包含逗号 (',')。我怎样才能读到这个记录?
建议使用 CSVExcelStorage 或 CSVLoader API 加载数据。
REGISTER piggybank.jar;
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage() AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
或
REGISTER piggybank.jar;
A = LOAD '/user/hduser/salaryTravel.csv' using org.apache.pig.piggybank.storage. CSVLoader() AS (name:chararray,job:chararray,salary:float,TA:float,type:chararray,org:chararray,year:int);
参考:REGEX_EXTRACT error in PIG,分享了一些代码示例。