如何查找与模式列表不匹配的行？

Question

我想在文档中查找与模式列表不匹配的所有非空行。例如，在下面的文档片段中，我想要一个匹配行号 2、4、5、6、18、19、20 和 21 的正则表达式。

我想排除类似于 8、10、12、14、16 的行和所有空行。

逆模式是(?i)^.*02 December_|^\s*Python Proprietary|^\s*Python Regular Expression Specification|^.*page\s+\d+|^\s*$。我想要一个匹配所有不匹配上述模式的行的模式。

 1:
 2:This module provides regular expression matching operations.
 3:
 4:Regular expressions use the backslash character ('\') to indicate special forms
 5:or to allow special characters to be used without invoking their special
 6:meaning.
 7:
 8:Python Regular Expression                                           02 December 1999 
 9:
10:                                                                 Python Proprietary 
11:
12:----------------------- Page 292-----------------------
13:
14:PYTHON RE SPECIFICATION Version 2.7 [Vol 9, Part Q]                     page 983 
15:
16:Python Regular Expression Specification 
17:
18:It is important to note that most regular expression operations are available as
19:module-level functions and RegexObject methods. The functions are shortcuts that
20:don’t require you to compile a regex object first, but miss some fine-tuning
21:parameters.
22:

P.S。 -

我正在使用 re.match()。
实际文档在每一行的开头没有行号。为便于讨论，已在此代码段中添加行号。

Answer 1

试试这个

^.*?Python Regular Expression.*?$(*SKIP)(*FAIL)|^.*?Python Proprietary.*?$(*SKIP)(*FAIL)|.*?Page \d+.*?$(*SKIP)(*FAIL)|^$(*SKIP)(*FAIL)|^.*?$

Demo

结果：

匹配 8 行 2、4、5、6、18、19、20 和 21。

解释：

^.*?Python Regular Expression.*?$(*SKIP)(*FAIL) 排除第 6、16 行。
^.*?Python Proprietary.*?$(*SKIP)(*FAIL) 排除第 10 行。
.*?Page \d+.*?$(*SKIP)(*FAIL) 排除第 12、14 行。
^$(*SKIP)(*FAIL) 排除所有空行。
^.*?$ 匹配所有其他行。

Answer 2

你可以使用负面展望：

正则表达式

^(?i)(?!-+\s+Page\s+\d+-+|Python\s+Regular\s+Expression\s+\d{2}.+\d{4}|.+Python\s+Proprietary|PYTHON\s+RE SPECIFICATION\s+Version.+\s+page\s+\d+|Python\s+Regular\s+Expression\s+Specification).+$

演示

Click to view

如何查找与模式列表不匹配的行？

How to find lines which do not match a list of patterns?

python

regex

regex-negation

python-2.7

正则表达式

演示

描述