PigStorage() 中的正则表达式
Regular Expression in PigStorage()
我在通过 pig 将日志文件加载到 hdfs 时感到震惊。
基本上,此日志文件具有用于分析的 WebSphere Server 异常。
现在,在分解异常详细信息的组件并加载到架构中时,我无法将正则表达式指定到 PigStorage 构造函数中。
我的代码:
inputFile = load '/datalake/xxx/yyy/bd_cni/log_analytics_project/raw_data/APSRP7420/SystemOut_16.05.22_11.46.13.log' USING PigStorage('\[\d+\/\d+\/\d+\s+\d+\:\d+\:\d+\:\d+\s+\w+\]') as (someColumnName:chararray);
我得到的错误:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 147> Unexpected character '['
Details at logfile: /home/rshukla8/pig_1466510599995.log
我对 PIG 和 Unix 完全是新手,所以这里的任何指示都会有所帮助。
您可以使用如下所示的 PiggyBankCombinedLogLoader
REGISTER '<path of piggybank jar>/piggybank.jar';
logs = LOAD '/in/combined_access_log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader()
AS (addr: chararray, logname: chararray, user: chararray, time: chararray,
method: chararray, uri: chararray, proto: chararray,
status: int, bytes: int, referer: chararray, userAgent: chararray);
PigStorage cannot be instanciated with a regex, instead, you can use MyRegExLoader(String pattern)
of the piggybank图书馆
inputFile = load '/datalake/xxx/yyy/bd_cni/log_analytics_project/raw_data/APSRP7420/SystemOut_16.05.22_11.46.13.log' USING org.apache.pig.piggybank.storage.MyRegExLoader('\[\d+\/\d+\/\d+\s+\d+\:\d+\:\d+\:\d+\s+\w+\]') as (someColumnName:chararray);
我在通过 pig 将日志文件加载到 hdfs 时感到震惊。 基本上,此日志文件具有用于分析的 WebSphere Server 异常。
现在,在分解异常详细信息的组件并加载到架构中时,我无法将正则表达式指定到 PigStorage 构造函数中。
我的代码:
inputFile = load '/datalake/xxx/yyy/bd_cni/log_analytics_project/raw_data/APSRP7420/SystemOut_16.05.22_11.46.13.log' USING PigStorage('\[\d+\/\d+\/\d+\s+\d+\:\d+\:\d+\:\d+\s+\w+\]') as (someColumnName:chararray);
我得到的错误:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 147> Unexpected character '['
Details at logfile: /home/rshukla8/pig_1466510599995.log
我对 PIG 和 Unix 完全是新手,所以这里的任何指示都会有所帮助。
您可以使用如下所示的 PiggyBankCombinedLogLoader
REGISTER '<path of piggybank jar>/piggybank.jar';
logs = LOAD '/in/combined_access_log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader()
AS (addr: chararray, logname: chararray, user: chararray, time: chararray,
method: chararray, uri: chararray, proto: chararray,
status: int, bytes: int, referer: chararray, userAgent: chararray);
PigStorage cannot be instanciated with a regex, instead, you can use MyRegExLoader(String pattern)
of the piggybank图书馆
inputFile = load '/datalake/xxx/yyy/bd_cni/log_analytics_project/raw_data/APSRP7420/SystemOut_16.05.22_11.46.13.log' USING org.apache.pig.piggybank.storage.MyRegExLoader('\[\d+\/\d+\/\d+\s+\d+\:\d+\:\d+\:\d+\s+\w+\]') as (someColumnName:chararray);