为什么我会错过 Python 中的某些迭代？

Question

我有一个词性 (POS) 标记的平行语料库，其中包含 directory1 中源语言的 25 个文件和 directory2 中目标语言中的 25 个文件。每个文件包含 1000 行，即每个目录 25000 行。

手头的任务：我想删除 POS 标签，然后将源语言和目标语言的所有文本写在一个单独的文本文件中, 比如说, source.txt & target.txt.

幸运的是，我这样做了（见下面的代码）但是当我运行代码时 - 有时 source.txt 或 target.txt 有 24896 行或 24871 行等但不是 25000。之后运行将代码重复 2-3 次，我得到两个文件的 25000 行。

Sample POS tagged input: Need\VBN of\IN delivery\NN with\IN operation\NN .\.

这种行为对我（非 CS 毕业生）来说很神秘。 对这种行为有任何解释吗？或者就是这样。

如果这是一个愚蠢的问题，请原谅我！

outfile1 = open("source.txt",'w')
outfile2 = open("target.txt",'w')

path = '/somePath/'
file_names = []; tempDataSrc = []; tempDataTrg = []

for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith('.txt'):
            file_names.append(os.path.join(root, file))

file_names = sorted(file_names)

for file in file_names:  
    if ("Src_" in file): # filtering source language files
        infile1 = open(file,'r')
        for line_s in infile1:
            line_s = " ".join(word.split("\")[0] for word in line_s.split())
            tempDataSrc.append(line_e)

for file in file_names: 
    if ("Trg_" in file): # filtering target language files
        infile2 = open(file,'r')
        for line_t in infile2:
            line_t = " ".join(word.split("\")[0] for word in line_t.split())
            tempDataTrg.append(line_p)

for line1 in tempDataSrc:
    outfile1.write(line1+'\n')

for line2 in tempDataTrg:
    outfile2.write(line2+'\n')

注意：我使用 python 3.6 安装了 conda。我运行在 Spyder IDE 中编写我的代码； OS: Ubuntu 14.04.5

PS：还鼓励以更 pythonic 的方式编写代码的任何建议

Answer 1

我想这种行为与您的运行程序环境有关）IDE 或您的 OS 本身）突然终止进程，但事实并非如此完成将输出写入文件 - 因为您没有在代码中关闭我们的输出文件。

您只需在代码的最后调用 "outfile1" 和 "outfile2" 上的 .close() 方法即可解决此问题。

但是，当您要求输入以更 Pythonic 的方式做事时：由于您只在脚本末尾写入输出，因此只有 "open" 然后靠近那部分才有意义代码也是如此。既然我们这样做了，您不妨使用 with 语句来创建和写入这两个文件——这将确保生成的所有数据都刷新到保存的磁盘，即使在由于其他原因提前终止的情况下也是如此错误：

with open("source.txt",'w') as outfile1:
    for line1 in tempDataSrc:
        outfile1.write(line1+'\n')

with open("target.txt",'w') as outfile2:
    for line2 in tempDataTrg:
        outfile2.write(line2+'\n')

（with语句会自动关闭文件并刷新数据）。

为什么我会错过 Python 中的某些迭代？

Why do I miss some iterations in Python?

python

python-2.7

python-3.x

spyder