正则表达式——忽略到给定点的线

Question

我有一个适用于部分数据的正则表达式。（Perl 兼容）鉴于日志条目：

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我可以使用正则表达式：[\>\:]*\s+(.*?)\:?\s\<(.+?)\> 得到我想要的结果。 (http://regexr.com/3fatg)

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

不幸的是，当我构建这个正则表达式时，我忽略了日志的重要部分——第一部分。日志实际上是这样的：

Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我的提取不再正确 -- 它被第一部分抛弃了。 (http://regexr.com/3fbod) 我如何从该日志文件中排除开始信息？

**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我想我需要在最后一次出现 ] 之后开始搜索（就在 pam_vas 之前），但我不知道如何排除它。

Answer 1

您可以通过以下方式实现：

\b                 # a word boundary
(?P<key>[\w(): ]+) # the key part - word characters, (, ), :, spaces
\h+                # at least one whitespace (can be more)
<(?P<value>[^>]+)> # the value part in <> brackets

参见a demo on regex101.com。这样一来，什么都不需要忽略了。

Answer 2

更新： 看错题了，最合适的regex好像是

(?:^.*?pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>

我玩过几个变体，但发现这是最快的，捕获并忽略日期戳

This may suffice (?:^\*\*[^*]*\*\*[ ]pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>

除非您使用带有 ignorewhitespace 的东西，否则您可以去掉单个空格周围的方括号。 [ ] 到 </code>.</p> <p>有更短的变体，但缺点是捕获太多或采取很多步骤来完成，对于我发现的所有内容，大约为 500-800，而此处为 104。</p> <pre><code>(?: # Opens non-capturing group (ncg) ^ # ^ start of line, you may actually not want this \*\* # Literally ** [^*]* # Anything but *, as many times as possible \*\* # Literally ** [ ] # A single space, only in brackets for visibility pam_vas: # Literally pam_vas: ) # Closes NCG ? # Iterates NCG 0 or 1 times, thus "optional" \s+ # Any number of space characters, one or more ( # Opens Capturing Group 1 [^<:]* # Any Character but < or :, as many times as possible ) # Closes CG1 :? # :, 0 or 1 times [ ] # A single in space, only in brackets for visibility < # Literally < ( # Opens CG2 [^>]* # Any character but >, as many times as possible ) # Closes CG2 > # Literally >

Answer 3

在 Splunk 论坛上与某人交谈后，我有这个正则表达式：

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>

http://regexr.com/3fbpb

正则表达式——忽略到给定点的线

Regex -- ignoring the line up to a given point

regex

key-value