如何 return 不匹配特定模式的字符串列表？

Question

我正在尝试 return 与文本文件中的特定模式不匹配的所有结果，但我对语法有困难。

pattern is [A-Z]+\_[A-Z0-9]+\_[0-9]+\_[0-9]+\.[A-Z]{3}

尝试了以下但没有成功：

'^(?![A-Z]+\_[A-Z0-9]+\_[0-9]+\_[0-9]+\.[A-Z]{3}$).*$'

r'^(?!([A-Z]+\_[A-Z0-9]+\_[0-9]+\_[0-9]+\.[A-Z]{3}).)*$'

下面是匹配模式的代码，现在我需要找到所有不匹配的条目。

pattern = r'[A-Z]+\_[A-Z0-9]+\_[0-9]+\_[0-9]+\.[A-Z]{3}'

regex1 = re.compile(pattern, flags = re.IGNORECASE)

regex1.findall(text1)

数据样本如下：

plos_annotate5_1375_1.txt plos_annotate5_1375_2.txt plos_anno%tate5_1375_3.txt plos_annotate6_1032_1.txt

第三根弦是我想拉的

Answer 1

您可以检查一下您的正则表达式不是数学运算：

if regex.match(text1) is None:
    # Do magic you need

Answer 2

如果可以在 Python 中进行，为什么要在正则表达式中进行否定？

strings_without_rx = [s for s in the_strings if not regex1.search(s)]

如果你想扫描文件行，你甚至不需要将它们全部存储，因为一个打开的文件是它的行的可迭代：

with open("some.file") as source:
  lines_without_rx = [s for s in source if not regex1.search(s)]
# Here the file is auto-closed.

Answer 3

我建议对你的模式使用负前瞻断言：

r'(?![A-Z]+\_[A-Z0-9]+\_[0-9]+\_[0-9]+\.[A-Z]{3}[^A-Za-z0-9_+\.-]+)'

没有任何循环，如果将它与 findall 一起使用，它将为您提供所有不匹配的模式：

re.findall(r'(?![A-Z]+\_[A-Z0-9]+\_[0-9]+\_[0-9]+\.[A-Z]{3}[^A-Za-z0-9_+\.-]+)')

How to return a list of strings that do not match a specific pattern?