使用 python 删除文件中的重复单词
Remove repeated words in a file with python
我有一个重复了多个单词的文本文件。
我需要每个单词只出现一次。
遵循我正在尝试开发的代码
import codecs
wordList = codecs.open('Arquivo.txt' , 'r')
wordList2 = codecs.open('Arquivo2.txt', 'w')
for x in range(len(wordList)) :
for y in range(x + 1, len(wordList ) ):
if wordList[x] == wordList[y]:
wordList2.append(wordList[x] )
for y in wordList2:
wordList.remove(y)
错误
wordList2 = codecs.open('File2.txt', 'w').readline()
IOError: File not open for reading
也许你想试试这个。它将使 wordList
成为一个列表而不是一个文件对象。对 wordList2 也做同样的事情。
.strip()
将删除换行符。
wordList =[line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
编辑:这是完整的代码,希望对您有用
import codecs
wordList = [line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
wordList2 = [line.strip() for line in codecs.open('File2.txt', 'r').readlines()]
for x in range(len(wordList)) :
for y in range(x + 1, len(wordList ) ):
if wordList[x] == wordList[y]:
wordList2.append(wordList[x])
for y in wordList2:
wordList.remove(y)
# assuming the code above is working
# now write your updated contents
with open('outfile1.txt','w') as outfile1:
for word in wordList:
outfile1.write(word + '\n')
with open('outfile2.txt','w') as outfile2:
for word in wordList2:
outfile2.write(word + '\n')
编辑 2:如果您想使用字典而不是列表(因为字典查找需要 O(1) 时间复杂度,而不是在两个列表中强制比较重复项)
wordList = {line.strip():1 for line in codecs.open('File.txt' , 'r').readlines()}
其中 line.strip()
是您的密钥,1
是您的 value.To "remove" 您可以通过 wordList[word] = 0
[= 将其值设置为 0 的单词18=]
我有一个重复了多个单词的文本文件。 我需要每个单词只出现一次。
遵循我正在尝试开发的代码
import codecs
wordList = codecs.open('Arquivo.txt' , 'r')
wordList2 = codecs.open('Arquivo2.txt', 'w')
for x in range(len(wordList)) :
for y in range(x + 1, len(wordList ) ):
if wordList[x] == wordList[y]:
wordList2.append(wordList[x] )
for y in wordList2:
wordList.remove(y)
错误
wordList2 = codecs.open('File2.txt', 'w').readline()
IOError: File not open for reading
也许你想试试这个。它将使 wordList
成为一个列表而不是一个文件对象。对 wordList2 也做同样的事情。
.strip()
将删除换行符。
wordList =[line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
编辑:这是完整的代码,希望对您有用
import codecs
wordList = [line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
wordList2 = [line.strip() for line in codecs.open('File2.txt', 'r').readlines()]
for x in range(len(wordList)) :
for y in range(x + 1, len(wordList ) ):
if wordList[x] == wordList[y]:
wordList2.append(wordList[x])
for y in wordList2:
wordList.remove(y)
# assuming the code above is working
# now write your updated contents
with open('outfile1.txt','w') as outfile1:
for word in wordList:
outfile1.write(word + '\n')
with open('outfile2.txt','w') as outfile2:
for word in wordList2:
outfile2.write(word + '\n')
编辑 2:如果您想使用字典而不是列表(因为字典查找需要 O(1) 时间复杂度,而不是在两个列表中强制比较重复项)
wordList = {line.strip():1 for line in codecs.open('File.txt' , 'r').readlines()}
其中 line.strip()
是您的密钥,1
是您的 value.To "remove" 您可以通过 wordList[word] = 0
[= 将其值设置为 0 的单词18=]