PIG - 过滤器或如何进入袋子或元组的一侧

PIG - Filter or how to get in side of a bag or tuple

如您所见,我们可以对第一个应用过滤器,因为我们可以对温度使用聚合。现在我们如何对 STRINGS 应用第二个过滤器?

我们仅尝试在晴朗和部分多云的条件下过滤 e。

Weather = LOAD 'hdfs:/home/hduser/final/Weather.csv' USING PigStorage(',');
A = FOREACH Weather GENERATE (int)[=10=] AS year, (int) AS month, (int) AS day, (int) AS temp,  AS cond, (double) as dewpoint , (double) as wind;


group_by_day = GROUP A BY (year,month,day);

架构:

   {day: (year: int,month: int, day: int), temperature {(temp: int)},                   

   condition: {cond: bytearray)}, dewPoint: {(dewpoint: double)} windSpeed:

   {(wind: double)}}

您必须在下面将 cond 转换为 chararray statement.Since 您没有在加载语句中指定数据类型,所有字段将被加载,因为 bytearray.That 是 PigStorage 选择的默认数据类型。

A = FOREACH Weather GENERATE (int)[=10=] AS year, (int) AS month, (int) AS day, (int) AS temp, (chararray) AS cond, (double) as dewpoint , (double) as wind;

编辑

我能够通过使用 BagToString 获得结果 function.You 可以在 1 步 iteslf 中进行过滤。

D = FILTER C BY (MIN(temperature) >= 60 AND MAX(temperature) <= 79) AND (BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy');

或者你的情况

f = FILTER e BY BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy';