遍历生成器时出现 IndexError
IndexError while iterating through a generator
我正在尝试解决我的编程问题 classes。我得到一个包含电子邮件和特殊文件的文件夹。特殊文件总是以“!”开头。我应该在 Corpus class 中添加一个方法 emails()。该方法应该是一个生成器。这是它的使用示例:
corpus = Corpus('/path/to/directory/with/emails')
count = 0
# Go through all emails and print the filename and the message body
for fname, body in corpus.emails():
print(fname)
print(body)
print('-------------------------')
count += 1
print('Finished: ', count, 'files processed.')
这是 class 和我写的方法:
class Corpus:
def __init__(self, path_to_mails_directory):
self.path_to_mails_directory = path_to_mails_directory
def emails(self):
iterator = 0
mail_body = None
mails_folder = os.listdir(self.path_to_mails_directory)
lenght = len(mails_folder)
while iterator <= lenght:
if not mails_folder[iterator].startswith("!"):
with open(self.path_to_mails_directory+"/"+mails_folder[iterator]) as an_e_mail:
mail_body = an_e_mail.read()
yield mails_folder[iterator], mail_body
iterator += 1
我尝试 运行 示例代码是这样的:
if __name__ == "__main__":
my_corpus = Corpus("data/1")
my_gen = my_corpus.emails()
count = 0
for fname, body in my_gen:
print(fname)
print(body)
print("------------------------------")
count += 1
print("finished: " + str(count))
Python 按预期打印了大量邮件(该文件夹包含大约一千个文件)然后:
Traceback (most recent call last):
File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 26, in <module>
for fname, body in my_gen:
File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 15, in emails
if not mails_folder[iterator].startswith("!"):
IndexError: list index out of range
我不知道问题出在哪里,希望得到任何帮助。谢谢
编辑:我根据您的建议稍微更新了代码。
执行此操作的一个好方法如下:
def emails(self):
mail_body = None
mails_folder = os.listdir(self.path_to_mails_directory)
for mail in mails_folder:
if mail.startswith("!"):
pass
else:
with open(self.path_to_mails_directory+"/"+mail) as an_e_mail:
mail_body = an_e_mail.read()
yield mail, mail_body
基于索引的迭代不被认为是 Pythonic。您应该更喜欢 "for mail in mails_folder:" 语法。
我正在尝试解决我的编程问题 classes。我得到一个包含电子邮件和特殊文件的文件夹。特殊文件总是以“!”开头。我应该在 Corpus class 中添加一个方法 emails()。该方法应该是一个生成器。这是它的使用示例:
corpus = Corpus('/path/to/directory/with/emails')
count = 0
# Go through all emails and print the filename and the message body
for fname, body in corpus.emails():
print(fname)
print(body)
print('-------------------------')
count += 1
print('Finished: ', count, 'files processed.')
这是 class 和我写的方法:
class Corpus:
def __init__(self, path_to_mails_directory):
self.path_to_mails_directory = path_to_mails_directory
def emails(self):
iterator = 0
mail_body = None
mails_folder = os.listdir(self.path_to_mails_directory)
lenght = len(mails_folder)
while iterator <= lenght:
if not mails_folder[iterator].startswith("!"):
with open(self.path_to_mails_directory+"/"+mails_folder[iterator]) as an_e_mail:
mail_body = an_e_mail.read()
yield mails_folder[iterator], mail_body
iterator += 1
我尝试 运行 示例代码是这样的:
if __name__ == "__main__":
my_corpus = Corpus("data/1")
my_gen = my_corpus.emails()
count = 0
for fname, body in my_gen:
print(fname)
print(body)
print("------------------------------")
count += 1
print("finished: " + str(count))
Python 按预期打印了大量邮件(该文件夹包含大约一千个文件)然后:
Traceback (most recent call last):
File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 26, in <module>
for fname, body in my_gen:
File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 15, in emails
if not mails_folder[iterator].startswith("!"):
IndexError: list index out of range
我不知道问题出在哪里,希望得到任何帮助。谢谢
编辑:我根据您的建议稍微更新了代码。
执行此操作的一个好方法如下:
def emails(self):
mail_body = None
mails_folder = os.listdir(self.path_to_mails_directory)
for mail in mails_folder:
if mail.startswith("!"):
pass
else:
with open(self.path_to_mails_directory+"/"+mail) as an_e_mail:
mail_body = an_e_mail.read()
yield mail, mail_body
基于索引的迭代不被认为是 Pythonic。您应该更喜欢 "for mail in mails_folder:" 语法。