PCRE 正则表达式 - 将所有内容与未包含在方括号中的第一个管道匹配
PCRE Regex - Match everything to the first pipe not enclosed by square brackets
我有以下文本行,我试图在其中提取所有内容,直到第一个未包含在方括号中的竖线字符。
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name
预期输出:
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"
即除了尾随 | stats values(savedsearch_name) AS search_name
之外的所有内容
根据一些环视示例,我可以(几乎)使用 JavaScript 正则表达式
得到我需要的东西
/.*\|(?![^\[]*\])/g
- http://refiddle.com/refiddles/596dec4c75622d608f290000
但这并没有很好地转化为有效的 PCRE 兼容表达式(另外我想捕获所有内容,但不包括第一个管道)。
根据我的阅读,第一个方括号中的嵌套方括号可能是一个无法解决的复杂问题?在任何给定的集合中只有一层嵌套括号(例如 [..[]..]
或 [..[]..[]..]
)
我承认我不认为我完全了解正面和负面的环顾四周,但我们将不胜感激任何帮助!
在这种情况下,匹配所有非定界符比尝试拆分更有效:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*
详情:
(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
# current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
( # open the capture group 1: describe a bracket part
\[
[^][]*+ # all that isn't a bracket (note that you don't have to care
# about of the pipe here, you are between brackets)
(?:
(?1) # refer to the capture group 1 subpattern (it's a recursion
# since this reference is in the capture group 1 itself)
[^][]*
)*+
]
) # close the capture group 1
[^][|]*
)*
如果你也需要空的部分,你可以这样重写:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)
我有以下文本行,我试图在其中提取所有内容,直到第一个未包含在方括号中的竖线字符。
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name
预期输出:
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"
即除了尾随 | stats values(savedsearch_name) AS search_name
根据一些环视示例,我可以(几乎)使用 JavaScript 正则表达式
得到我需要的东西/.*\|(?![^\[]*\])/g
- http://refiddle.com/refiddles/596dec4c75622d608f290000
但这并没有很好地转化为有效的 PCRE 兼容表达式(另外我想捕获所有内容,但不包括第一个管道)。
根据我的阅读,第一个方括号中的嵌套方括号可能是一个无法解决的复杂问题?在任何给定的集合中只有一层嵌套括号(例如 [..[]..]
或 [..[]..[]..]
)
我承认我不认为我完全了解正面和负面的环顾四周,但我们将不胜感激任何帮助!
在这种情况下,匹配所有非定界符比尝试拆分更有效:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*
详情:
(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
# current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
( # open the capture group 1: describe a bracket part
\[
[^][]*+ # all that isn't a bracket (note that you don't have to care
# about of the pipe here, you are between brackets)
(?:
(?1) # refer to the capture group 1 subpattern (it's a recursion
# since this reference is in the capture group 1 itself)
[^][]*
)*+
]
) # close the capture group 1
[^][|]*
)*
如果你也需要空的部分,你可以这样重写:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)