Hadoop Pig:使用 STARTSWITH 显示条目
Hadoop Pig: Show entries using STARTSWITH
我在使用 STARTSWITH 字符串函数时遇到问题。我想显示 System_Period 中以 20040
开头的所有记录
transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:int);
sysGroup = GROUP transactions BY System_Period;
sysFilter = FILTER sysGroup BY STARTSWITH(transactions.System_Period, 20040);
DUMP sysFilter;
我收到的错误是
Could not infer the matching function for org.apache.pig.builtin.STARTSWITH as multiple or none of them fit. Please use an explicit cast.
STARTSWITH
只是用来比较一个tuple1和tuple2,检查tuple1是否包含tuple2。您不能将关系或包传递给它。还有一件事要注意的是它只接受 String(chararray) 而不是整数。在 GROUP BY 之前过滤以 20040 开头的 system_period 并将 system_period 加载为字符数组,然后根据需要在过滤器之后将其投射。
transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:chararray);
sysFilter = FILTER transactions BY STARTSWITH(System_Period, '20040');
Else after GROUP BY
FLATTEN
结果然后 filter
transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:chararray);
sysGroup = GROUP transactions BY System_Period;
flatres = FOREACH sysGroup GENERATE group,FLATTEN(transactions);
sysFilter = FILTER flatres BY STARTSWITH(System_Period, '20040');
我在使用 STARTSWITH 字符串函数时遇到问题。我想显示 System_Period 中以 20040
开头的所有记录transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:int);
sysGroup = GROUP transactions BY System_Period;
sysFilter = FILTER sysGroup BY STARTSWITH(transactions.System_Period, 20040);
DUMP sysFilter;
我收到的错误是
Could not infer the matching function for org.apache.pig.builtin.STARTSWITH as multiple or none of them fit. Please use an explicit cast.
STARTSWITH
只是用来比较一个tuple1和tuple2,检查tuple1是否包含tuple2。您不能将关系或包传递给它。还有一件事要注意的是它只接受 String(chararray) 而不是整数。在 GROUP BY 之前过滤以 20040 开头的 system_period 并将 system_period 加载为字符数组,然后根据需要在过滤器之后将其投射。
transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:chararray);
sysFilter = FILTER transactions BY STARTSWITH(System_Period, '20040');
Else after GROUP BY
FLATTEN
结果然后 filter
transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv'
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int,
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int,
Product_Number:int, Sales_Amount:double, Employee_Number:int,
Service_Date:chararray, System_Period:chararray);
sysGroup = GROUP transactions BY System_Period;
flatres = FOREACH sysGroup GENERATE group,FLATTEN(transactions);
sysFilter = FILTER flatres BY STARTSWITH(System_Period, '20040');