Python 中的反向匹配帮助
Inverse Match Help in Python
您好,我正在查看 trim McAfee 日志文件并删除所有 "is OK" 和其他我不想看到的报告实例。在我们使用 shell 脚本之前,它利用了 grep 的 -v 选项,但现在我们正在寻找可以在 linux 和 [=34= 上运行的 python 脚本].经过几次尝试,我能够让一个正则表达式在在线正则表达式构建器中工作,但是我很难将它实现到我的脚本中。
Online REGEX Builder
编辑:我想删除 "is OK"、"is a broken"、"is a block lines" 和 "file could not be opened" 行,这样我就只剩下一个包含问题的文件我感兴趣的。在 shell:
中有点像这样
grep -v "is OK" ${OUTDIR}/${OUTFILE} | grep -v "is a broken" | grep -v "file could not be opened" | grep -v "is a block" > ${OUTDIR}/${OUTFILE}.trimmed 2>&1
我在此处阅读并搜索文件:
import re
f2 = open(outFilePath)
contents = f2.read()
print contents
p = re.compile("^((?!(is OK)|(file could not be opened)| (is a broken)|(is a block)))*$", re.MULTILINE | re.DOTALL)
m = p.findall(contents)
print len(m)
for iter in m:
print iter
f2.close()
我正在尝试搜索的文件示例:
eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current -- ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016
AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
/tmp/tmp.BQshVRSiBo ... is OK.
/tmp/keyring-F6vVGf/socket ... file could not be opened.
/tmp/keyring-F6vVGf/socket.ssh ... file could not be opened.
/tmp/keyring-F6vVGf/socket.pkcs11 ... file could not be opened.
/tmp/yum.log ... is OK.
/tmp/tmp.oW75zGUh4S ... is OK.
/tmp/.X11-unix/X0 ... file could not be opened.
/tmp/tmp.LCZ9Ji6OLs ... is OK.
/tmp/tmp.QdAt1TNQSH ... is OK.
/tmp/ks-script-MqIN9F ... is OK.
/tmp/tmp.mHXPvYeKjb/mcupgrade.conf ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uninstall-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/mcscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/install-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/readme.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan_secure ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/signlic.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/liblnxfv.so.4 ... is OK.
但是我没有得到正确的输出。我也尝试删除 MULTILINE 和 DOTALL 选项,但仍然没有得到正确的响应。下面是 运行 DOTALL 和 MULTILINE 时的输出。
9
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
任何帮助将不胜感激!!谢谢!!
或许想得更简单,一行一行:
import re
import sys
pattern = re.compile(r"(is OK)|(file could not be opened)|(is a broken)|(is a block)")
with open(sys.argv[1]) as handle:
for line in handle:
if not pattern.search(line):
sys.stdout.write(line)
输出:
eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current -- ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016
AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
有时正则表达式更复杂,但如果您真的只是在寻找这些模式,那么我可能会尝试简单的方法:
terms = (
'is OK',
'file could not be opened',
'is a broken',
'is a block',
)
with open('/tmp/sample.log') as f:
for line in f:
if line.strip() and not any(term in line for term in terms):
print(line, end='')
它可能不会比正则表达式快,但它已经很简单了。或者,您也可以使用稍微更严格的方法:
terms = (
'is a broken',
'is a block',
)
with open('/tmp/samplelog.log') as f:
for line in f:
line = line.strip()
if not line:
continue
elif line.endswith('is OK.'):
continue
elif line.endswith('file could not be opened.'):
continue
elif any(term in line for term in terms):
continue
print(line)
我将采用的方法在很大程度上取决于我希望谁使用该脚本:)
试试这个(一行完成)
p = re.compile("^(?:[if](?!s OK|s a broken|s a block|ile could not be opened)|[^if])*$")
这意味着如果在一行中你有一个 "i" 或 "f" 它不能跟在提到的后缀后面或者它不是 "i" 或 "f"那么就可以了。它对行中的所有字符重复此操作。
编辑:在 regex101.com 进行测试后,我发现它无法正常工作的原因。这是行得通的一行正则表达式。
p = re.compile("^(?:[^if\n]|[if](?!s OK|ile could not be openeds OK|s a broken|s a block|ile could not be opened))*$", re.MULTILINE)
我知道现在回答已经来不及了。但是我看到没有答案是正确的解决方案。
你在这种情况下的正则表达式是错误的。您有不必要的附加组,缺少一个句点“。”此外,它只会匹配句子开头的“is OK|file could not be opened|is a broken”。
"hello world is OK": does not match
"is OK hello world": matches
在反向匹配中,只需使用 Non-capturing 组 '(?:)' 而不是捕获组 '()'。这是为了不获取空字符串。
如果要删除整个句子,可以使用如下表达式:
r"^(?!.*(?:is OK|is a broken|file could not be opened)).*"
"is OK. hello world": matches
"hello world is OK.": matches
"is Ok.": matches
如果要删除整个句子但只删除以“is OK.|File could not be opened.|Is a broken..”结尾的句子,可以使用以下表达式:
r"^(?!.*(?:is OK|is a broken|file could not be opened)\.$).*"
"is OK. hello world" does not match
"hello world is OK.": matches
"is Ok.": matches
记得使用 Non-capturing 组 '(?:)' 而不是捕获组 '()',否则你会得到一个空字符串:
#Capturing group
regex = r"^(?!.*(is OK|file could not be opened|is a broken|is a block)).*"
print(re.findall(regex,text,flags=re.MULTILINE))
输出:
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
使用join()函数获取全文
#Non-capturing group
regex = r"^(?!.*(?:is OK|file could not be opened|is a broken|is a block)).*"
print("\n".join(re.findall(regex,text,flags=re.MULTILINE)))
输出:
eth1
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current -- ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016
AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
您好,我正在查看 trim McAfee 日志文件并删除所有 "is OK" 和其他我不想看到的报告实例。在我们使用 shell 脚本之前,它利用了 grep 的 -v 选项,但现在我们正在寻找可以在 linux 和 [=34= 上运行的 python 脚本].经过几次尝试,我能够让一个正则表达式在在线正则表达式构建器中工作,但是我很难将它实现到我的脚本中。 Online REGEX Builder
编辑:我想删除 "is OK"、"is a broken"、"is a block lines" 和 "file could not be opened" 行,这样我就只剩下一个包含问题的文件我感兴趣的。在 shell:
中有点像这样grep -v "is OK" ${OUTDIR}/${OUTFILE} | grep -v "is a broken" | grep -v "file could not be opened" | grep -v "is a block" > ${OUTDIR}/${OUTFILE}.trimmed 2>&1
我在此处阅读并搜索文件:
import re
f2 = open(outFilePath)
contents = f2.read()
print contents
p = re.compile("^((?!(is OK)|(file could not be opened)| (is a broken)|(is a block)))*$", re.MULTILINE | re.DOTALL)
m = p.findall(contents)
print len(m)
for iter in m:
print iter
f2.close()
我正在尝试搜索的文件示例:
eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current -- ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016
AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
/tmp/tmp.BQshVRSiBo ... is OK.
/tmp/keyring-F6vVGf/socket ... file could not be opened.
/tmp/keyring-F6vVGf/socket.ssh ... file could not be opened.
/tmp/keyring-F6vVGf/socket.pkcs11 ... file could not be opened.
/tmp/yum.log ... is OK.
/tmp/tmp.oW75zGUh4S ... is OK.
/tmp/.X11-unix/X0 ... file could not be opened.
/tmp/tmp.LCZ9Ji6OLs ... is OK.
/tmp/tmp.QdAt1TNQSH ... is OK.
/tmp/ks-script-MqIN9F ... is OK.
/tmp/tmp.mHXPvYeKjb/mcupgrade.conf ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uninstall-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/mcscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/install-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/readme.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan_secure ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/signlic.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/liblnxfv.so.4 ... is OK.
但是我没有得到正确的输出。我也尝试删除 MULTILINE 和 DOTALL 选项,但仍然没有得到正确的响应。下面是 运行 DOTALL 和 MULTILINE 时的输出。
9
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
任何帮助将不胜感激!!谢谢!!
或许想得更简单,一行一行:
import re
import sys
pattern = re.compile(r"(is OK)|(file could not be opened)|(is a broken)|(is a block)")
with open(sys.argv[1]) as handle:
for line in handle:
if not pattern.search(line):
sys.stdout.write(line)
输出:
eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current -- ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016
AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
有时正则表达式更复杂,但如果您真的只是在寻找这些模式,那么我可能会尝试简单的方法:
terms = (
'is OK',
'file could not be opened',
'is a broken',
'is a block',
)
with open('/tmp/sample.log') as f:
for line in f:
if line.strip() and not any(term in line for term in terms):
print(line, end='')
它可能不会比正则表达式快,但它已经很简单了。或者,您也可以使用稍微更严格的方法:
terms = (
'is a broken',
'is a block',
)
with open('/tmp/samplelog.log') as f:
for line in f:
line = line.strip()
if not line:
continue
elif line.endswith('is OK.'):
continue
elif line.endswith('file could not be opened.'):
continue
elif any(term in line for term in terms):
continue
print(line)
我将采用的方法在很大程度上取决于我希望谁使用该脚本:)
试试这个(一行完成)
p = re.compile("^(?:[if](?!s OK|s a broken|s a block|ile could not be opened)|[^if])*$")
这意味着如果在一行中你有一个 "i" 或 "f" 它不能跟在提到的后缀后面或者它不是 "i" 或 "f"那么就可以了。它对行中的所有字符重复此操作。
编辑:在 regex101.com 进行测试后,我发现它无法正常工作的原因。这是行得通的一行正则表达式。
p = re.compile("^(?:[^if\n]|[if](?!s OK|ile could not be openeds OK|s a broken|s a block|ile could not be opened))*$", re.MULTILINE)
我知道现在回答已经来不及了。但是我看到没有答案是正确的解决方案。
你在这种情况下的正则表达式是错误的。您有不必要的附加组,缺少一个句点“。”此外,它只会匹配句子开头的“is OK|file could not be opened|is a broken”。
"hello world is OK": does not match
"is OK hello world": matches
在反向匹配中,只需使用 Non-capturing 组 '(?:)' 而不是捕获组 '()'。这是为了不获取空字符串。
如果要删除整个句子,可以使用如下表达式:
r"^(?!.*(?:is OK|is a broken|file could not be opened)).*"
"is OK. hello world": matches
"hello world is OK.": matches
"is Ok.": matches
如果要删除整个句子但只删除以“is OK.|File could not be opened.|Is a broken..”结尾的句子,可以使用以下表达式:
r"^(?!.*(?:is OK|is a broken|file could not be opened)\.$).*"
"is OK. hello world" does not match
"hello world is OK.": matches
"is Ok.": matches
记得使用 Non-capturing 组 '(?:)' 而不是捕获组 '()',否则你会得到一个空字符串:
#Capturing group
regex = r"^(?!.*(is OK|file could not be opened|is a broken|is a block)).*"
print(re.findall(regex,text,flags=re.MULTILINE))
输出:
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
使用join()函数获取全文
#Non-capturing group
regex = r"^(?!.*(?:is OK|file could not be opened|is a broken|is a block)).*"
print("\n".join(re.findall(regex,text,flags=re.MULTILINE)))
输出:
eth1
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current -- ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016
AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY