如何忽略数据字段中的“,”

How to ignore "," in data fields

我正在尝试生成以下... 输入 396124436476092416,"Think about the life you livin but don't think so hard it hurts Life is truly a gift, but at the same it is a curse",Obey_Jony09 396124440112951296,“00:00 #MAW”,WesleyBitton

A = LOAD '/user/root/data/tweets.csv' USING PigStorage(',') as (users:chararray, tweets:chararray);
B = FILTER A by users == '396124436476092416';

输出被截断 (396124436476092416,"想想你过的生活,但不要想得那么痛苦,生活真的是一份礼物)

输出除外 (396124436476092416,"Think about the life you livin but don't think so hard it hurts Life is truly a gift, but at the same it is a curse")

我不想把行读成行。

您可以使用CSVLoader加载数据

但是,如果您不想这样做,这里有 Apache Pig 本身的解决方法:

--加载您​​的数据

A  = LOAD 'your/path/users.csv' USING TextLoader() AS (unparsed:chararray);

--将您的 " 字符串替换为 | 以便分隔您的推文

B = FOREACH A GENERATE REPLACE(unparsed, '\"', '|') AS parsed:chararray;

--将您的临时解析数据存储到您的位置

STORE B INTO 'your/path/parsed_users.csv' USING PigStorage('|');

--加载你解析的数据

C = LOAD 'your/path/parsed_users.csv' USING PigStorage('|') AS (users:chararray, tweets:chararray);

--转储你的数据,但它仍然会包含一个额外的逗号(,),但你可以使用替换功能替换它,你明白了。

DUMP C;

这符合 csv standardization, so you need just to use CSVLoader

supports double-quoted fields that contain commas and other double-quotes escaped with backslashes.

这是使用方法:

register file:/home/hadoop/lib/pig/piggybank.jar
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
A = LOAD '/user/root/data/tweets.csv' USING CSVLoader AS (users:chararray, tweets:chararray); 
B = FILTER A by users == '396124436476092416';