使用 REGEX_EXTRACT_ALL 但投影时我得到“()”

using REGEX_EXTRACT_ALL but on projection I'm getting "()"

我正在使用 Cloudera - quickstat 5.4。我有一个文件,每一行都有数据,如:

323.81.303.680 - - [25/Oct/2011:01:41:00 -0500] "GET /download/download6.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19"

在 apache pig 中,我使用的脚本如下:

A= LOAD 'weblog.txt' using TextLoader() as (line:chararray);
B= FOREACH A GENERATE 
FLATTEN(REGEX_EXTRACT_ALL(line,'^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] “(.+?)” (\S+) (\S+) “([^”]*)” “([^”]*)”')) AS (remoteAddr: chararray, remoteLogname: chararray, user: chararray, time:chararray, request: chararray, status:int,bytes_string:chararray,referrer: chararray, browser: chararray);

DUMP B;

以上查询的输出给出类似

的输出

()
()

谁能告诉我我做错了什么?正则表达式可以吗?

在末尾添加 , line,在 chararray 之后)和 ; 之前:

A= LOAD 'weblog.txt' using TextLoader() as (line:chararray);
B= FOREACH A GENERATE FLATTEN(
    REGEX_EXTRACT_ALL(line,'^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(.+?)" (\S+) (\S+) "([^"]*)" "([^"]*)"')) 
     AS (remoteAddr: chararray, remoteLogname: chararray, user: chararray, time:chararray, request: chararray, status:int,bytes_string:chararray,referrer: chararray, browser: chararray)
       , line;

DUMP B;

至于正则表达式,它与示例字符串匹配得很好,请参阅 regex demo