在猪中创建一个巨大的过滤器

Creating a massive filter by in pig

我有这个代码。

large = load 'a super large file' 

CC = FILTER large BY  == 'abc OR  == 'abc' 
OR  == 'def' or  == 'def' ....;

OR 条件的数量可以达到数百甚至数千。

有更好的方法吗?

是的,将这些条件放在另一个 file.Load 中,将其放入一个关系中,并在 column.If 上加入这两个关系,您必须在多个列上进行过滤,然后创建与 conditions.Below 是 2 列的示例

large = load 'a super large file' 
filter1 = load 'file with values needed to compare with ';
filter2 = load 'file with values needed to compare with ';
f1 = JOIN large BY ,filter1 BY [=10=];
f2 = JOIN large BY ,filter2 BY [=10=];
final = UNION f1,f2;
DUMP final;

您或许可以使用 1 个包含多个列的筛选器文件,并加入这些列以获得不同的筛选结果,然后合并关系。

large = load 'a super large file' 
filter_file = load 'file with values in different columns';

f1 = JOIN large BY ,filter_file BY [=11=];
f2 = JOIN large BY ,filter_file BY ;
final = UNION f1,f2;
DUMP final;