正则表达式——忽略到给定点的线
Regex -- ignoring the line up to a given point
我有一个适用于部分数据的正则表达式。 (Perl 兼容)
鉴于日志条目:
pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我可以使用正则表达式:[\>\:]*\s+(.*?)\:?\s\<(.+?)\>
得到我想要的结果。 (http://regexr.com/3fatg)
Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob
不幸的是,当我构建这个正则表达式时,我忽略了日志的重要部分——第一部分。
日志实际上是这样的:
Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我的提取不再正确 -- 它被第一部分抛弃了。 (http://regexr.com/3fbod)
我如何从该日志文件中排除开始信息?
**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我想我需要在最后一次出现 ] 之后开始搜索(就在 pam_vas 之前),但我不知道如何排除它。
您可以通过以下方式实现:
\b # a word boundary
(?P<key>[\w(): ]+) # the key part - word characters, (, ), :, spaces
\h+ # at least one whitespace (can be more)
<(?P<value>[^>]+)> # the value part in <> brackets
参见a demo on regex101.com。这样一来,什么都不需要忽略了。
更新: 看错题了,最合适的regex好像是
(?:^.*?pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>
我玩过几个变体,但发现这是最快的,捕获并忽略日期戳
This may suffice (?:^\*\*[^*]*\*\*[ ]pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>
除非您使用带有 ignorewhitespace 的东西,否则您可以去掉单个空格周围的方括号。 [ ]
到 </code>.</p>
<p>有更短的变体,但缺点是捕获太多或采取很多步骤来完成,对于我发现的所有内容,大约为 500-800,而此处为 104。</p>
<pre><code>(?: # Opens non-capturing group (ncg)
^ # ^ start of line, you may actually not want this
\*\* # Literally **
[^*]* # Anything but *, as many times as possible
\*\* # Literally **
[ ] # A single space, only in brackets for visibility
pam_vas: # Literally pam_vas:
) # Closes NCG
? # Iterates NCG 0 or 1 times, thus "optional"
\s+ # Any number of space characters, one or more
( # Opens Capturing Group 1
[^<:]* # Any Character but < or :, as many times as possible
) # Closes CG1
:? # :, 0 or 1 times
[ ] # A single in space, only in brackets for visibility
< # Literally <
( # Opens CG2
[^>]* # Any character but >, as many times as possible
) # Closes CG2
> # Literally >
在 Splunk 论坛上与某人交谈后,我有这个正则表达式:
\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>
我有一个适用于部分数据的正则表达式。 (Perl 兼容) 鉴于日志条目:
pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我可以使用正则表达式:[\>\:]*\s+(.*?)\:?\s\<(.+?)\>
得到我想要的结果。 (http://regexr.com/3fatg)
Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob
不幸的是,当我构建这个正则表达式时,我忽略了日志的重要部分——第一部分。 日志实际上是这样的:
Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我的提取不再正确 -- 它被第一部分抛弃了。 (http://regexr.com/3fbod) 我如何从该日志文件中排除开始信息?
**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我想我需要在最后一次出现 ] 之后开始搜索(就在 pam_vas 之前),但我不知道如何排除它。
您可以通过以下方式实现:
\b # a word boundary
(?P<key>[\w(): ]+) # the key part - word characters, (, ), :, spaces
\h+ # at least one whitespace (can be more)
<(?P<value>[^>]+)> # the value part in <> brackets
参见a demo on regex101.com。这样一来,什么都不需要忽略了。
更新: 看错题了,最合适的regex好像是
(?:^.*?pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>
我玩过几个变体,但发现这是最快的,捕获并忽略日期戳
This may suffice (?:^\*\*[^*]*\*\*[ ]pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>
除非您使用带有 ignorewhitespace 的东西,否则您可以去掉单个空格周围的方括号。 [ ]
到 </code>.</p>
<p>有更短的变体,但缺点是捕获太多或采取很多步骤来完成,对于我发现的所有内容,大约为 500-800,而此处为 104。</p>
<pre><code>(?: # Opens non-capturing group (ncg)
^ # ^ start of line, you may actually not want this
\*\* # Literally **
[^*]* # Anything but *, as many times as possible
\*\* # Literally **
[ ] # A single space, only in brackets for visibility
pam_vas: # Literally pam_vas:
) # Closes NCG
? # Iterates NCG 0 or 1 times, thus "optional"
\s+ # Any number of space characters, one or more
( # Opens Capturing Group 1
[^<:]* # Any Character but < or :, as many times as possible
) # Closes CG1
:? # :, 0 or 1 times
[ ] # A single in space, only in brackets for visibility
< # Literally <
( # Opens CG2
[^>]* # Any character but >, as many times as possible
) # Closes CG2
> # Literally >
在 Splunk 论坛上与某人交谈后,我有这个正则表达式:
\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>