Apache Pig 将行转换为以字符分隔的单列
Apache Pig convert rows to a single column delimited with character
我需要将值列转换为按城市分组并用“|”分隔的单行(管道)字符
DATA = LOAD '/tmp/test.dat' Using PigStorage(',') as (
CITY:chararray,
VALUE:chararray
)
输入:(City/Value)
伊斯坦布尔,1
伊斯坦布尔,2
伊斯坦布尔,3
纽约,8
纽约,9
输出:
伊斯坦布尔,1|2|3
纽约,8|9
首先在 CITY
上进行分组,然后使用 BagToString (http://pig.apache.org/docs/r0.15.0/func.html#bagtostring) 将每个组的值转换为所需的字符串表示形式。类似(未经测试!)
data = LOAD '/tmp/test.dat' using PigStorage(',') AS (city:chararray, value:chararray);
data_grp = GROUP data BY city;
result = FOREACH data_grp GENERATE group AS city, BagToString(data.value, '|') AS values;
我需要将值列转换为按城市分组并用“|”分隔的单行(管道)字符
DATA = LOAD '/tmp/test.dat' Using PigStorage(',') as ( CITY:chararray, VALUE:chararray )
输入:(City/Value)
伊斯坦布尔,1
伊斯坦布尔,2
伊斯坦布尔,3
纽约,8
纽约,9
输出:
伊斯坦布尔,1|2|3
纽约,8|9
首先在 CITY
上进行分组,然后使用 BagToString (http://pig.apache.org/docs/r0.15.0/func.html#bagtostring) 将每个组的值转换为所需的字符串表示形式。类似(未经测试!)
data = LOAD '/tmp/test.dat' using PigStorage(',') AS (city:chararray, value:chararray);
data_grp = GROUP data BY city;
result = FOREACH data_grp GENERATE group AS city, BagToString(data.value, '|') AS values;