使用 python 从文本文件中删除重复项（原始的和重复的）

Question

我尝试删除两个重复项，例如：

STANGHOLMEN_TA02_GT11
STANGHOLMEN_TA02_GT41
STANGHOLMEN_TA02_GT81
STANGHOLMEN_TA02_GT11
STANGHOLMEN_TA02_GT81

结果

STANGHOLMEN_TA02_GT41

我试过这个脚本

lines_seen = set() 
with open(example.txt, "w") as output_file:
    for each_line in open(example2.txt, "r"):
        if each_line not in lines_seen: 
            output_file.write(each_line)
            lines_seen.add(each_line)

但不幸的是，它并没有像我想要的那样工作，它漏掉了行，也没有删除行。原文件行与行之间不时有空格

Answer 1

您需要执行 2 次操作才能正常工作。因为通过 1 遍，您将不知道当前行是否会在以后重复。你应该尝试这样的事情：

# count each line occurances
lines_count = {}
for each_line in open('example2.txt', "r"):
    lines_count[each_line] = lines_count.get(each_line, 0) + 1

# write only the lines that are not repeated
with open('example.txt', "w") as output_file:
    for each_line, count in lines_count.items():
        if count == 1:
            output_file.write(each_line)

使用 python 从文本文件中删除重复项（原始的和重复的）

Remove both duplicates (original and duplicate) from text file using python

python

string

filtering

file

python-3.x