简化一系列 try-except + if 语句，以便在 Python 中更快地处理

Question

我在目录中的一堆文件中使用正则表达式处理字符串。对于文件中的每一行，我应用一系列 try 语句来匹配模式，如果匹配，则转换输入。分析完每一行后，我将其写入一个新文件。我有很多这些 try-else 后跟 if 语句（我在这里只包括两个作为说明）。我的问题是，在处理了几个文件后，脚本速度变慢了很多，几乎完全停止了进程。我不知道我的代码中是什么导致速度变慢，但我感觉这是 try-else + if 语句的组合。如何简化转换，以便以合理的速度处理数据？

还是我需要一个更高效的迭代器，它不会在相同程度上占用内存？

如有任何反馈，我们将不胜感激！

import re
import glob

fileCounter = 0 

for infile in glob.iglob(r'\input-files\*.txt'):

    fileCounter += 1
    outfile = r'\output-files\output_%s.txt' % fileCounter

    with open(infile, "rb") as inList, open(outfile, "wb") as outlist:

        for inline in inlist:

            inword = inline.strip('\r\n')

            #apply some text transformations
            #Transformation #1
            try: result = re.match('^[AEIOUYaeiouy]([bcćdfghjklłmnńprsśtwzżź]|rz|sz|cz|dz|dż|dź|ch)[aąeęioóuy](.*\[=\].*)*', inword).group()
            except: result = None

            if result == inword:
                inword = re.sub('(?<=^[AEIOUYaeiouy])(?=([bcćdfghjklłmnńprsśtwzżź]|rz|sz|cz|dz|dż|dź|ch)[aąeęioóuy])', '[=]', wbWord)

            #Transformation #2 etc.
            try: result = re.match('(.*\[=\].*)*(\w?\w?)[AEIOUYaąeęioóuy]\[=\][ćsśz][ptkbdg][aąeęioóuyrfw](.*\[=\].*)*', inword).group()
            except: result = None

            if result == inword:   
                inword =  re.sub('(?<=[AEIOUYaąeęioóuy])\[=\](?=[ćsśz][ptkbdg][aąeęioóuyrfw])', '', inword)
                inword =  re.sub('(?<=[AEIOUYaąeęioóuy][ćsśz])(?=[ptkbdg][aąeęioóuyrfw])', '[=]', inword)

            outline = inword + "\n"
            outlist.write(outline)

    print "Processed file number %s" % fileCounter          
print "*** Processing completed ***"

Answer 1

try/except 确实不是测试 re.match() 结果的最有效方法（也不是最易读的方法），但罚球仍应（或多或少）保持不变- 性能在执行过程中不应下降（直到可能由于您的数据而发生一些最坏的情况，但是很好） - 所以问题可能出在其他地方。

FWIW 你可以先用适当的规范解决方案替换你的 try/except 块，即代替：

try:
    result = re.match(someexp, yourline).group()
except:
    result = None

你想要：

match = re.match(someexp, yourline)
result = match.group() if match else None

这会略微提高性能，但最重要的是，使您的代码更具可读性和可维护性 - 至少它不会隐藏任何意外错误。

作为旁注，从不使用一个简单的 except 子句，总是只捕获预期的异常（这里它本来是一个 AttributeError 因为 re.match() returns None 当没有匹配项并且 None 当然没有属性 group).

这很可能无法解决您的问题，但至少您会知道问题出在其他地方。

简化一系列 try-except + if 语句，以便在 Python 中更快地处理

streamlining series of try-except + if-statements for faster processing in Python

python

iterator

loops

if-statement

try-except