遍历生成器时出现 IndexError

IndexError while iterating through a generator

我正在尝试解决我的编程问题 classes。我得到一个包含电子邮件和特殊文件的文件夹。特殊文件总是以“!”开头。我应该在 Corpus class 中添加一个方法 emails()。该方法应该是一个生成器。这是它的使用示例:

corpus = Corpus('/path/to/directory/with/emails')
count = 0
# Go through all emails and print the filename and the message body
for fname, body in corpus.emails():
    print(fname)
    print(body)
    print('-------------------------')
    count += 1
print('Finished: ', count, 'files processed.')

这是 class 和我写的方法:

class Corpus:
    def __init__(self, path_to_mails_directory):
        self.path_to_mails_directory = path_to_mails_directory

    def emails(self):
    iterator = 0
    mail_body = None
    mails_folder = os.listdir(self.path_to_mails_directory)
    lenght = len(mails_folder)
    while iterator <= lenght:
        if not mails_folder[iterator].startswith("!"):
            with open(self.path_to_mails_directory+"/"+mails_folder[iterator]) as an_e_mail:
                mail_body = an_e_mail.read()
            yield mails_folder[iterator], mail_body
        iterator += 1

我尝试 运行 示例代码是这样的:

if __name__ == "__main__":
    my_corpus = Corpus("data/1")
    my_gen = my_corpus.emails()
    count = 0
    for fname, body in my_gen:
        print(fname)
        print(body)
        print("------------------------------")
        count += 1
    print("finished: " + str(count))

Python 按预期打印了大量邮件(该文件夹包含大约一千个文件)然后:

Traceback (most recent call last):
  File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 26, in <module>
    for fname, body in my_gen:
  File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 15, in emails
    if not mails_folder[iterator].startswith("!"):
IndexError: list index out of range

我不知道问题出在哪里,希望得到任何帮助。谢谢

编辑:我根据您的建议稍微更新了代码。

执行此操作的一个好方法如下:

def emails(self):
    mail_body = None
    mails_folder = os.listdir(self.path_to_mails_directory)
    for mail in mails_folder:
        if mail.startswith("!"):
            pass
        else:
            with open(self.path_to_mails_directory+"/"+mail) as an_e_mail:
                mail_body = an_e_mail.read()
            yield mail, mail_body

基于索引的迭代不被认为是 Pythonic。您应该更喜欢 "for mail in mails_folder:" 语法。