正则表达式——忽略到给定点的线

Regex -- ignoring the line up to a given point

我有一个适用于部分数据的正则表达式。 (Perl 兼容) 鉴于日志条目:

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我可以使用正则表达式:[\>\:]*\s+(.*?)\:?\s\<(.+?)\> 得到我想要的结果。 (http://regexr.com/3fatg)

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

不幸的是,当我构建这个正则表达式时,我忽略了日志的重要部分——第一部分。 日志实际上是这样的:

Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我的提取不再正确 -- 它被第一部分抛弃了。 (http://regexr.com/3fbod) 我如何从该日志文件中排除开始信息?

**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我想我需要在最后一次出现 ] 之后开始搜索(就在 pam_vas 之前),但我不知道如何排除它。

您可以通过以下方式实现:

\b                 # a word boundary
(?P<key>[\w(): ]+) # the key part - word characters, (, ), :, spaces
\h+                # at least one whitespace (can be more)
<(?P<value>[^>]+)> # the value part in <> brackets

参见a demo on regex101.com。这样一来,什么都不需要忽略了。

更新: 看错题了,最合适的regex好像是

(?:^.*?pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>


我玩过几个变体,但发现这是最快的,捕获并忽略日期戳

This may suffice (?:^\*\*[^*]*\*\*[ ]pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>

除非您使用带有 ignorewhitespace 的东西,否则您可以去掉单个空格周围的方括号。 [ ]</code>.</p> <p>有更短的变体,但缺点是捕获太多或采取很多步骤来完成,对于我发现的所有内容,大约为 500-800,而此处为 104。</p> <pre><code>(?: # Opens non-capturing group (ncg) ^ # ^ start of line, you may actually not want this \*\* # Literally ** [^*]* # Anything but *, as many times as possible \*\* # Literally ** [ ] # A single space, only in brackets for visibility pam_vas: # Literally pam_vas: ) # Closes NCG ? # Iterates NCG 0 or 1 times, thus "optional" \s+ # Any number of space characters, one or more ( # Opens Capturing Group 1 [^<:]* # Any Character but < or :, as many times as possible ) # Closes CG1 :? # :, 0 or 1 times [ ] # A single in space, only in brackets for visibility < # Literally < ( # Opens CG2 [^>]* # Any character but >, as many times as possible ) # Closes CG2 > # Literally >

在 Splunk 论坛上与某人交谈后,我有这个正则表达式:

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>

http://regexr.com/3fbpb