PIG - 过滤器或如何进入袋子或元组的一侧
PIG - Filter or how to get in side of a bag or tuple
如您所见,我们可以对第一个应用过滤器,因为我们可以对温度使用聚合。现在我们如何对 STRINGS 应用第二个过滤器?
我们仅尝试在晴朗和部分多云的条件下过滤 e。
Weather = LOAD 'hdfs:/home/hduser/final/Weather.csv' USING PigStorage(',');
A = FOREACH Weather GENERATE (int)[=10=] AS year, (int) AS month, (int) AS day, (int) AS temp, AS cond, (double) as dewpoint , (double) as wind;
group_by_day = GROUP A BY (year,month,day);
架构:
{day: (year: int,month: int, day: int), temperature {(temp: int)},
condition: {cond: bytearray)}, dewPoint: {(dewpoint: double)} windSpeed:
{(wind: double)}}
您必须在下面将 cond 转换为 chararray statement.Since 您没有在加载语句中指定数据类型,所有字段将被加载,因为 bytearray.That 是 PigStorage 选择的默认数据类型。
A = FOREACH Weather GENERATE (int)[=10=] AS year, (int) AS month, (int) AS day, (int) AS temp, (chararray) AS cond, (double) as dewpoint , (double) as wind;
编辑
我能够通过使用 BagToString 获得结果 function.You 可以在 1 步 iteslf 中进行过滤。
D = FILTER C BY (MIN(temperature) >= 60 AND MAX(temperature) <= 79) AND (BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy');
或者你的情况
f = FILTER e BY BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy';
如您所见,我们可以对第一个应用过滤器,因为我们可以对温度使用聚合。现在我们如何对 STRINGS 应用第二个过滤器?
我们仅尝试在晴朗和部分多云的条件下过滤 e。
Weather = LOAD 'hdfs:/home/hduser/final/Weather.csv' USING PigStorage(',');
A = FOREACH Weather GENERATE (int)[=10=] AS year, (int) AS month, (int) AS day, (int) AS temp, AS cond, (double) as dewpoint , (double) as wind;
group_by_day = GROUP A BY (year,month,day);
架构:
{day: (year: int,month: int, day: int), temperature {(temp: int)},
condition: {cond: bytearray)}, dewPoint: {(dewpoint: double)} windSpeed:
{(wind: double)}}
您必须在下面将 cond 转换为 chararray statement.Since 您没有在加载语句中指定数据类型,所有字段将被加载,因为 bytearray.That 是 PigStorage 选择的默认数据类型。
A = FOREACH Weather GENERATE (int)[=10=] AS year, (int) AS month, (int) AS day, (int) AS temp, (chararray) AS cond, (double) as dewpoint , (double) as wind;
编辑
我能够通过使用 BagToString 获得结果 function.You 可以在 1 步 iteslf 中进行过滤。
D = FILTER C BY (MIN(temperature) >= 60 AND MAX(temperature) <= 79) AND (BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy');
或者你的情况
f = FILTER e BY BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy';