如何将正则表达式应用于充满 .txt 文件的文件夹？

Question

我有这个正则表达式：

regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VMP\w+)'

我想将它应用到一个充满 txt 文件的文件夹和 return 每个文档作为一个列表和一个新行。像这样：

[pattern of the regex 1]
[pattern of the regex 2]
...
[pattern of the regex n]
[pattern of the regex n-1]

所以这是我尝试过的：

directory_ = '/Users/user/path/folder_txts/'
regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VMP\w+)'

def retrive(directory, a_regex):
    for filename in glob.glob(os.path.join(directory, '*.txt')):
        with open(filename, 'r') as file:
            important_stuff = re.findall(a_regex, file.read())
            my_list = [tuple([j.split()[0] for j in i]) for i in important_stuff]
            print my_list

这是输出：

print retrive(directory_, regex_)
['']
['']
...
['']

这是错误的，因为输出应该如下所示：

[('string', 'string', 'string'), ('string', 'string', 'string')]
[('string', 'string', 'string'), ('string', 'string', 'string')]
...
[('string', 'string', 'string'), ('string', 'string', 'string')]

如何将上述正则表达式应用到目录的整个 txt 文件，并将它们 return 作为按名称字母顺序排序的列表？ This 是一个 txt 文件的例子。

Answer 1

您好，您的正则表达式出了点问题。

请提供

regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VM\w+)'

而不是

regex_ = r'(\w+\s+RN).*?(\w+\s+VA\w+).*?(\w+\s+VMP\w+)'

并在函数中

important_stuff = re.findall(a_regex, file.read(), re.S)

如何将正则表达式应用于充满 .txt 文件的文件夹？

How to apply a regex to a folder full of .txt files?

python

regex

io

parsing

python-2.7