PCRE 正则表达式 - 将所有内容与未包含在方括号中的第一个管道匹配

PCRE Regex - Match everything to the first pipe not enclosed by square brackets

我有以下文本行,我试图在其中提取所有内容,直到第一个未包含在方括号中的竖线字符。

action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name

预期输出:

action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"

即除了尾随 | stats values(savedsearch_name) AS search_name

之外的所有内容

根据一些环视示例,我可以(几乎)使用 JavaScript 正则表达式

得到我需要的东西

/.*\|(?![^\[]*\])/g - http://refiddle.com/refiddles/596dec4c75622d608f290000

但这并没有很好地转化为有效的 PCRE 兼容表达式(另外我想捕获所有内容,但不包括第一个管道)。

根据我的阅读,第一个方括号中的嵌套方括号可能是一个无法解决的复杂问题?在任何给定的集合中只有一层嵌套括号(例如 [..[]..][..[]..[]..]

我承认我不认为我完全了解正面和负面的环顾四周,但我们将不胜感激任何帮助!

在这种情况下,匹配所有非定界符比尝试拆分更有效:

(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*

demo

详情:

(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
         # current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
    (  # open the capture group 1: describe a bracket part
        \[
         [^][]*+ # all that isn't a bracket (note that you don't have to care
                 # about of the pipe here, you are between brackets)
         (?:
             (?1)  # refer to the capture group 1 subpattern (it's a recursion
                   # since this reference is in the capture group 1 itself)
             [^][]* 
         )*+
         ]
    ) # close the capture group 1
    [^][|]*
)*

如果你也需要空的部分,你可以这样重写:

(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)