在 hive 中按 id 收集数据
Collect data by id in hive
我有一个table,其中的行格式如下
user | purchase | time_of_purchase|quantity
样本
1234 | Bread | Jul 7 20:48| 1
1234 | Shaving Cream | July 10 14:20 | 2
5678 | Milk | July 7 3:48 | 1
5678 | Bread | July 7 3:49 | 2
5678 | Bread | July 7 15:30 | 1
我想按以下格式创建用户的购买历史记录
1234 | {[Bread , Jul 7 20:48,1] ,[ Shaving Cream , July 10 14:20, 2 ]}
5678 | {[Milk, July 7 3:48 , 1 ] , [Bread , July 7 3:49 , 2], [Bread , July 7 15:30 , 1]}
是否可以在 hive 或 pig 脚本中执行此操作?我试过 collect_list 但这并不能保持跨列组合的顺序,还尝试了 brickhouse collect 但它的行为类似于 collect_set 并且我丢失了部分信息。
PIG 脚本
File = LOAD 'file.txt' using PigStorage(',') as (user:int, Purchase:chararray, timeofpurchase:chararray, quantity:int);
GRP_USER = GROUP File by user;
DUMP GRP_USER;
你可以参考http://ybhavesh.blogspot.com/
上的几个例子
希望对您有所帮助
我有一个table,其中的行格式如下
user | purchase | time_of_purchase|quantity
样本
1234 | Bread | Jul 7 20:48| 1
1234 | Shaving Cream | July 10 14:20 | 2
5678 | Milk | July 7 3:48 | 1
5678 | Bread | July 7 3:49 | 2
5678 | Bread | July 7 15:30 | 1
我想按以下格式创建用户的购买历史记录
1234 | {[Bread , Jul 7 20:48,1] ,[ Shaving Cream , July 10 14:20, 2 ]}
5678 | {[Milk, July 7 3:48 , 1 ] , [Bread , July 7 3:49 , 2], [Bread , July 7 15:30 , 1]}
是否可以在 hive 或 pig 脚本中执行此操作?我试过 collect_list 但这并不能保持跨列组合的顺序,还尝试了 brickhouse collect 但它的行为类似于 collect_set 并且我丢失了部分信息。
PIG 脚本
File = LOAD 'file.txt' using PigStorage(',') as (user:int, Purchase:chararray, timeofpurchase:chararray, quantity:int);
GRP_USER = GROUP File by user;
DUMP GRP_USER;
你可以参考http://ybhavesh.blogspot.com/
上的几个例子希望对您有所帮助