re.compile 只接受两个参数，有没有办法让它接受更多？或者另一种方法？

Question

我可以在我的计算机上访问 txt 文件格式的电子邮件，现在我的目标是从中抓取特定数据。我使用 re.compile 和 enumerate 来解析电子邮件以查找匹配的词（在我的例子中是 GOM Cod 等鱼类），然后打印它们。但是我还需要解析 100 多封电子邮件，每封电子邮件中都列出了几种不同的鱼类……所以我的问题是：解决这个问题的最佳方法是什么？我不能将所有 17 种不同的可能的鱼种都放入一个 re.compile 函数中，所以我是否应该只有 17 个不同的相同代码块，每个块中只改变鱼种？那是最有效的方法吗？我的代码如下。

import os
import email
import re

path = 'Z:\folderwithemail'

for filename in os.listdir(path):
file_path = os.path.join(path, filename)
if os.path.isfile(file_path):
    with open(file_path, 'r') as f:
        sector_result = []
        pattern = re.compile("GOM Cod", re.IGNORECASE)
        for linenum, line in enumerate(f):
            if pattern.search(line) != None:
                sector_result.append((linenum, line.rstrip('\n')))
                for linenum, line in sector_result:
                    print ("Fish Species:", line)

Answer 1

您可以使用竖线在鱼种之间切换|:

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B

pattern = re.compile(r"GOM Cod|Salmon|Tuna", re.IGNORECASE)

re.compile 只接受两个参数，有没有办法让它接受更多？或者另一种方法？

re.compile only takes two arguments, is there a way to make it take more? Or another way around that?

python

screen-scraping

enumerate