尝试在猪中使用 Python UDF 时无法存储别名 C
Unable to store alias C, while trying to use Python UDF in pig
我的 Python UDF 代码:
#commaFormat- format a number with commas, 12345-> 12,345
@outputSchema("numformat:chararray")
def commaFormat(num):
return '{:,}'.format(num)
我的 Pig 脚本:
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage;
A = LOAD '/result.csv' using CSVExcelStorage() As (id:int,lastvisitedtime:chararray,title:chararray,typedcount:int,URL:chararray,visitcount:int,bytes:int);
B = limit A 15;
REGISTER '/data/pyudf/test.py' USING streaming_python AS myudfs;
C = FOREACH B generate myudfs.commaFormat();
Pig 堆栈跟踪:
--------------- ERROR 1002: Unable to store alias C
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
to open iterator for alias C at
org.apache.pig.PigServer.openIterator(PigServer.java:1019) at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:747)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at
org.apache.pig.Main.run(Main.java:630) at
org.apache.pig.Main.main(Main.java:176) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by:
org.apache.pig.PigException: ERROR 1002: Unable to store alias C at
org.apache.pig.PigServer.storeEx(PigServer.java:1122) at
org.apache.pig.PigServer.store(PigServer.java:1081) at
org.apache.pig.PigServer.openIterator(PigServer.java:994) ... 13 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
0: Exception while executing (Name: C:
Store(hdfs://localhost:54310/tmp/temp1063554930/tmp-651585063:org.apache.pig.impl.io.InterStorage)
- scope-16 Operator Key: scope-16): org.apache.pig.impl.streaming.StreamingUDFException: LINE : KeyError:
'concatMult4'
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:159)
at
org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:157)
at
org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:306)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1474) at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1459)
at org.apache.pig.PigServer.storeEx(PigServer.java:1118) ... 15 more
Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE :
KeyError: 'concatMult4'
at
org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:503)
首先,您在定义语句中缺少 ()。
REGISTER /path/piggybank.jar;
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage();
您可能正在使用 Mortar 的 cPython 发行版,它至少需要 pig0.12。尝试使用 jython 脚本引擎。
REGISTER '/data/pyudf/test.py' USING jython AS myudfs;
C = FOREACH B generate myudfs.commaFormat();
或者,您可以使用 REPLACE 函数轻松删除逗号,而不是编写 UDF。
REGISTER /path/piggybank.jar;
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage();
A = LOAD '/result.csv' using CSVExcelStorage() AS (id:int,lastvisitedtime:chararray,title:chararray,typedcount:int,URL:chararray,visitcount:int,bytes:int);
B = FOREACH A GENERATE id,REPLACE(lastvisitedtime,',',''),title,typedcount,URL,visitcount,bytes;
C = LIMIT B 15;
DUMP C;
Pig 不处理您的 Python UDF,它带来了依赖模块。
因此,您需要将它们包装在一个 JAR 中,并将该文件注册为您的 Pig 脚本的一部分。
REGISTER '/data/pyudf/test.py' USING jython AS myudfs;
我的 Python UDF 代码:
#commaFormat- format a number with commas, 12345-> 12,345
@outputSchema("numformat:chararray")
def commaFormat(num):
return '{:,}'.format(num)
我的 Pig 脚本:
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage;
A = LOAD '/result.csv' using CSVExcelStorage() As (id:int,lastvisitedtime:chararray,title:chararray,typedcount:int,URL:chararray,visitcount:int,bytes:int);
B = limit A 15;
REGISTER '/data/pyudf/test.py' USING streaming_python AS myudfs;
C = FOREACH B generate myudfs.commaFormat();
Pig 堆栈跟踪:
--------------- ERROR 1002: Unable to store alias C
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C at org.apache.pig.PigServer.openIterator(PigServer.java:1019) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:747) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:630) at org.apache.pig.Main.main(Main.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias C at org.apache.pig.PigServer.storeEx(PigServer.java:1122) at org.apache.pig.PigServer.store(PigServer.java:1081) at org.apache.pig.PigServer.openIterator(PigServer.java:994) ... 13 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: C: Store(hdfs://localhost:54310/tmp/temp1063554930/tmp-651585063:org.apache.pig.impl.io.InterStorage) - scope-16 Operator Key: scope-16): org.apache.pig.impl.streaming.StreamingUDFException: LINE : KeyError: 'concatMult4'
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:159) at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:157) at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:306) at org.apache.pig.PigServer.launchPlan(PigServer.java:1474) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1459) at org.apache.pig.PigServer.storeEx(PigServer.java:1118) ... 15 more Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE : KeyError: 'concatMult4'
at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:503)
首先,您在定义语句中缺少 ()。
REGISTER /path/piggybank.jar;
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage();
您可能正在使用 Mortar 的 cPython 发行版,它至少需要 pig0.12。尝试使用 jython 脚本引擎。
REGISTER '/data/pyudf/test.py' USING jython AS myudfs;
C = FOREACH B generate myudfs.commaFormat();
或者,您可以使用 REPLACE 函数轻松删除逗号,而不是编写 UDF。
REGISTER /path/piggybank.jar;
DEFINE CSVExcelStorage org.apache.pig.piggybank.storage.CSVExcelStorage();
A = LOAD '/result.csv' using CSVExcelStorage() AS (id:int,lastvisitedtime:chararray,title:chararray,typedcount:int,URL:chararray,visitcount:int,bytes:int);
B = FOREACH A GENERATE id,REPLACE(lastvisitedtime,',',''),title,typedcount,URL,visitcount,bytes;
C = LIMIT B 15;
DUMP C;
Pig 不处理您的 Python UDF,它带来了依赖模块。 因此,您需要将它们包装在一个 JAR 中,并将该文件注册为您的 Pig 脚本的一部分。
REGISTER '/data/pyudf/test.py' USING jython AS myudfs;