在猪身上不起作用
NOT IN function in pig
我试图在 pig 中使用 DIFF() 方法找出两个表(源和目标)之间的区别,以实现这一点:
sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:chararray,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:chararray,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
cogroupnew= COGROUP sourcenew by ID inner, destnew by ID inner;
diffnew = FOREACH cogroupnew GENERATE DIFF(sourcenew,destnew);
DUMP diffnew;
给出两个表之间的差异或 return 空包{}如果元组匹配,它工作正常直到这,我的下一步是在源文件中找到目标中不存在的额外记录,为此
cogroupextrainsource= COGROUP sourcenew by ID inner, destnew by ID;
filterextrainsource= FILTER cogroupextrainsource BY ID NOT (cogroupnew)
如预期的那样抛出错误。
需要帮助才能在源代码中找到更多内容。
帮助将不胜感激。
谢谢!
您不需要列名 ID 旁边的 $ 符号。$ 仅在您不想按名称访问列时使用。
cogroupextrainsource = COGROUP sourcenew by ID inner, destnew by ID;
我试图在 pig 中使用 DIFF() 方法找出两个表(源和目标)之间的区别,以实现这一点:
sourcenew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Source.txt' USING PigStorage(',') as (ID:chararray,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
destnew = LOAD 'hdfs://HADOOPMASTER:54310/DVTTest/Destination.txt' USING PigStorage(',') as (ID:chararray,Name:chararray,FirstName:chararray ,LastName:chararray,Vertical_Name:chararray ,Vertical_ID:chararray,Gender:chararray,DOB:chararray,Degree_Percentage:chararray ,Salary:chararray,StateName:chararray);
cogroupnew= COGROUP sourcenew by ID inner, destnew by ID inner;
diffnew = FOREACH cogroupnew GENERATE DIFF(sourcenew,destnew);
DUMP diffnew;
给出两个表之间的差异或 return 空包{}如果元组匹配,它工作正常直到这,我的下一步是在源文件中找到目标中不存在的额外记录,为此
cogroupextrainsource= COGROUP sourcenew by ID inner, destnew by ID;
filterextrainsource= FILTER cogroupextrainsource BY ID NOT (cogroupnew)
如预期的那样抛出错误。 需要帮助才能在源代码中找到更多内容。 帮助将不胜感激。
谢谢!
您不需要列名 ID 旁边的 $ 符号。$ 仅在您不想按名称访问列时使用。
cogroupextrainsource = COGROUP sourcenew by ID inner, destnew by ID;