python，查找重复项

Question

第一个代码返回 5021 行，而第二个代码只返回 2507 行。谁能告诉我为什么？我正在尝试查找重复项。

代码 1：

with open('output.txt', 'w', encoding = 'utf-8') as f_out:
    with open('org2fsjapan.txt', encoding = 'utf-8') as jap:
        a = jap.readline()
        f_out.write(a)
        for lines in jap:
            a = lines.find('1000190522')
            if not a == -1:   
                f_out.write(lines)

代码 2：

with open('output.txt','w') as f:

    with open('org2fsjapan.txt', encoding = 'utf-8') as jap:
        for lines in jap:
            lines = jap.readline()
            a = lines.find('1000190522')
            if not a == -1:
                xl = lines.split('|^|')
                f.write(xl[0]+','+xl[5]+'\n')

Answer 1

在我看来，您在 a = jap.readline() 之后的代码 1 中有一个额外的 f_out.write(a)。即使不匹配，这也会将每一行写入输出文件。

Answer 2

在第二个程序中，你有以下语句：

for lines in jap:
    lines = jap.readline()

for 循环和对 readline() 的调用都从 jap 引用的文件中读取了一行，因此每次迭代都会读取两行。

python，查找重复项

python, finding duplicates

python

csv

duplicates