使用 python 删除文件中的重复单词

Remove repeated words in a file with python

我有一个重复了多个单词的文本文件。 我需要每个单词只出现一次。

遵循我正在尝试开发的代码

import  codecs

 wordList = codecs.open('Arquivo.txt' , 'r')
 wordList2 = codecs.open('Arquivo2.txt', 'w')

for x in range(len(wordList)) :
    for y in range(x + 1, len(wordList ) ):
        if wordList[x] == wordList[y]:
            wordList2.append(wordList[x] )
        for y in wordList2:
            wordList.remove(y)

错误

    wordList2 = codecs.open('File2.txt', 'w').readline()
IOError: File not open for reading

也许你想试试这个。它将使 wordList 成为一个列表而不是一个文件对象。对 wordList2 也做同样的事情。

.strip() 将删除换行符。

wordList =[line.strip() for line in codecs.open('File.txt' , 'r').readlines()]

编辑:这是完整的代码,希望对您有用

import  codecs

wordList = [line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
wordList2 = [line.strip() for line in codecs.open('File2.txt', 'r').readlines()]
for x in range(len(wordList)) :
    for y in range(x + 1, len(wordList ) ):
        if wordList[x] == wordList[y]:
            wordList2.append(wordList[x])
        for y in wordList2:
            wordList.remove(y)

# assuming the code above is working
# now write your updated contents
with open('outfile1.txt','w') as outfile1:
    for word in wordList:
        outfile1.write(word + '\n')

with open('outfile2.txt','w') as outfile2:
    for word in wordList2:
        outfile2.write(word + '\n')

编辑 2:如果您想使用字典而不是列表(因为字典查找需要 O(1) 时间复杂度,而不是在两个列表中强制比较重复项)

wordList = {line.strip():1 for line in codecs.open('File.txt' , 'r').readlines()}

其中 line.strip() 是您的密钥,1 是您的 value.To "remove" 您可以通过 wordList[word] = 0[= 将其值设置为 0 的单词18=]