如何从一袋元组中提取不同的内容?
How to extract distinct from a bag of tuples?
所以我在描述之后在猪中有以下数据结构:
--------------------------------------------------------------------------------------------------------------------------------------------------------
| summed_hours_and_miles_by_driver | group:int | :bag{:tuple(driver_name:chararray)} | total_hours:long | total_miles:long |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| | 27 | {(Mark Lochbihler), ..., (Mark Lochbihler)} | 220 | 11006 |
--------------------------------------------------------------------------------------------------------------------------------------------------------
想法是驱动程序名称 (Mark Lochbihler) 在一袋元组中被复制多次。
我怎样才能将它限制为单个名称,比如 SQL 中的 DISTINCT?
使用 Distinct,假设 A 是你的关系,如下所示
summed_hours_and_miles_by_driver = FOREACH grp GENERATE
group,
org.apache.pig.builtin.Distinct(A.driver_name),
total_hours,
total_miles;
所以我在描述之后在猪中有以下数据结构:
--------------------------------------------------------------------------------------------------------------------------------------------------------
| summed_hours_and_miles_by_driver | group:int | :bag{:tuple(driver_name:chararray)} | total_hours:long | total_miles:long |
--------------------------------------------------------------------------------------------------------------------------------------------------------
| | 27 | {(Mark Lochbihler), ..., (Mark Lochbihler)} | 220 | 11006 |
--------------------------------------------------------------------------------------------------------------------------------------------------------
想法是驱动程序名称 (Mark Lochbihler) 在一袋元组中被复制多次。 我怎样才能将它限制为单个名称,比如 SQL 中的 DISTINCT?
使用 Distinct,假设 A 是你的关系,如下所示
summed_hours_and_miles_by_driver = FOREACH grp GENERATE
group,
org.apache.pig.builtin.Distinct(A.driver_name),
total_hours,
total_miles;