独特的单词作为每行一个单词保存到文本文件
Unique words save to text file as a word per line
[使用 Python 3.3.3]
我正在尝试分析文本文件,清理它们,打印唯一单词的数量,然后尝试将唯一单词列表保存到文本文件中,每行一个单词以及每个唯一单词的次数出现在清理后的单词列表中。
所以我所做的是我拿了文本文件(总理哈珀的演讲),通过只计算有效的字母字符和单个空格来清理它,然后我计算了唯一单词的数量,然后我需要制作一个保存的文本文件独特的单词,每个独特的单词在它自己的行和单词旁边,该单词在清理列表中出现的次数。这是我的。
def uniqueFrequency(newWords):
'''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
unique = sorted(set(newWords.split()))
for i in unique:
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
return unique
def saveUniqueList(uniqueLines, filename):
'''Function saves result of uniqueFrequency into a text file.'''
outFile = open(filename, "w")
outFile.write(uniqueLines)
outFile.close
newWords 是文本文件的清理版本,只有单词和空格,没有其他内容。所以,我希望将 newWords 文件中的每个唯一单词保存到一个文本文件中,每行一个单词,在单词旁边,有该单词在 newWords 中出现的次数(不在唯一单词列表中,因为这样每个单词都会出现 1 次)。我的功能有什么问题?谢谢!
基于
unique = sorted(set(newWords.split()))
for i in unique:
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
我猜 newWords
不是一个字符串列表,而是一个长字符串。如果是这样,newWords.count(i)
将为每个 i
.
return 0
尝试:
def uniqueFrequency(newWords):
'''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
wordList = newWords.split()
unique = sorted(set(wordList))
ret = ""
for i in unique:
ret = ret + i + " " + str(wordList.count(i)) + "\n"
return ret
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
上面的行附加在现有集合的末尾 - "unique",如果您改用其他变量名称,如 "var",那应该 return 正确。
def uniqueFrequency(newWords):
'''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
var = "";
unique = sorted(set(newWords.split()))
for i in unique:
var = str(var) + i + " " + str(newWords.count(i)) + "\n"
return var
试试 collections.Counter
。它是为这种情况而设计的。
下面IPython中的演示:
In [1]: from collections import Counter
In [2]: txt = """I'm trying to analyse text files, clean them up, print the amount of unique words, then try to save the unique words list to a text file, one word per line with the amount of times each unique word appears in the cleaned up list of words. SO what i did was i took the text file (a speech from prime minister harper), cleaned it up by only counting valid alphabetical characters and single spaces, then i counted the amount of unique words, then i needed to make a saved text file of the unique words, with each unique word being on its own line and beside the word, the number of occurances of that word in the cleaned up list. Here's what i have."""
In [3]: Counter(txt.split())
Out[3]: Counter({'the': 10, 'of': 7, 'unique': 6, 'i': 5, 'to': 4, 'text': 4, 'word': 4, 'then': 3, 'cleaned': 3, 'up': 3, 'amount': 3, 'words,': 3, 'a': 2, 'with': 2, 'file': 2, 'in': 2, 'line': 2, 'list': 2, 'and': 2, 'each': 2, 'what': 2, 'did': 1, 'took': 1, 'from': 1, 'words.': 1, '(a': 1, 'only': 1, 'harper),': 1, 'was': 1, 'analyse': 1, 'one': 1, 'number': 1, 'them': 1, 'appears': 1, 'it': 1, 'have.': 1, 'characters': 1, 'counted': 1, 'list.': 1, 'its': 1, "I'm": 1, 'own': 1, 'by': 1, 'save': 1, 'spaces,': 1, 'being': 1, 'clean': 1, 'occurances': 1, 'alphabetical': 1, 'files,': 1, 'counting': 1, 'needed': 1, 'that': 1, 'make': 1, "Here's": 1, 'times': 1, 'print': 1, 'up,': 1, 'beside': 1, 'trying': 1, 'on': 1, 'try': 1, 'valid': 1, 'per': 1, 'minister': 1, 'file,': 1, 'saved': 1, 'single': 1, 'words': 1, 'SO': 1, 'prime': 1, 'speech': 1, 'word,': 1})
(请注意,此解决方案还不完美;它尚未从单词中删除逗号。提示;请使用 str.replace
。)
Counter
是一个特化的dict
,以单词为键,计数为值。所以你可以这样使用它:
cnts = Counter(txt)
with open('counts.txt', 'w') as outfile:
for c in counts:
outfile.write("{} {}\n".format(c, cnts[c]))
请注意,在此解决方案中,我使用了一些很好理解的 Python 概念;
- 一个context manager
- 迭代
dict
(这是一个 iterator)
str.format
[使用 Python 3.3.3]
我正在尝试分析文本文件,清理它们,打印唯一单词的数量,然后尝试将唯一单词列表保存到文本文件中,每行一个单词以及每个唯一单词的次数出现在清理后的单词列表中。 所以我所做的是我拿了文本文件(总理哈珀的演讲),通过只计算有效的字母字符和单个空格来清理它,然后我计算了唯一单词的数量,然后我需要制作一个保存的文本文件独特的单词,每个独特的单词在它自己的行和单词旁边,该单词在清理列表中出现的次数。这是我的。
def uniqueFrequency(newWords):
'''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
unique = sorted(set(newWords.split()))
for i in unique:
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
return unique
def saveUniqueList(uniqueLines, filename):
'''Function saves result of uniqueFrequency into a text file.'''
outFile = open(filename, "w")
outFile.write(uniqueLines)
outFile.close
newWords 是文本文件的清理版本,只有单词和空格,没有其他内容。所以,我希望将 newWords 文件中的每个唯一单词保存到一个文本文件中,每行一个单词,在单词旁边,有该单词在 newWords 中出现的次数(不在唯一单词列表中,因为这样每个单词都会出现 1 次)。我的功能有什么问题?谢谢!
基于
unique = sorted(set(newWords.split()))
for i in unique:
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
我猜 newWords
不是一个字符串列表,而是一个长字符串。如果是这样,newWords.count(i)
将为每个 i
.
0
尝试:
def uniqueFrequency(newWords):
'''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
wordList = newWords.split()
unique = sorted(set(wordList))
ret = ""
for i in unique:
ret = ret + i + " " + str(wordList.count(i)) + "\n"
return ret
unique = str(unique) + i + " " + str(newWords.count(i)) + "\n"
上面的行附加在现有集合的末尾 - "unique",如果您改用其他变量名称,如 "var",那应该 return 正确。
def uniqueFrequency(newWords):
'''Function returns a list of unique words with amount of occurances of that
word in the text file.'''
var = "";
unique = sorted(set(newWords.split()))
for i in unique:
var = str(var) + i + " " + str(newWords.count(i)) + "\n"
return var
试试 collections.Counter
。它是为这种情况而设计的。
下面IPython中的演示:
In [1]: from collections import Counter
In [2]: txt = """I'm trying to analyse text files, clean them up, print the amount of unique words, then try to save the unique words list to a text file, one word per line with the amount of times each unique word appears in the cleaned up list of words. SO what i did was i took the text file (a speech from prime minister harper), cleaned it up by only counting valid alphabetical characters and single spaces, then i counted the amount of unique words, then i needed to make a saved text file of the unique words, with each unique word being on its own line and beside the word, the number of occurances of that word in the cleaned up list. Here's what i have."""
In [3]: Counter(txt.split())
Out[3]: Counter({'the': 10, 'of': 7, 'unique': 6, 'i': 5, 'to': 4, 'text': 4, 'word': 4, 'then': 3, 'cleaned': 3, 'up': 3, 'amount': 3, 'words,': 3, 'a': 2, 'with': 2, 'file': 2, 'in': 2, 'line': 2, 'list': 2, 'and': 2, 'each': 2, 'what': 2, 'did': 1, 'took': 1, 'from': 1, 'words.': 1, '(a': 1, 'only': 1, 'harper),': 1, 'was': 1, 'analyse': 1, 'one': 1, 'number': 1, 'them': 1, 'appears': 1, 'it': 1, 'have.': 1, 'characters': 1, 'counted': 1, 'list.': 1, 'its': 1, "I'm": 1, 'own': 1, 'by': 1, 'save': 1, 'spaces,': 1, 'being': 1, 'clean': 1, 'occurances': 1, 'alphabetical': 1, 'files,': 1, 'counting': 1, 'needed': 1, 'that': 1, 'make': 1, "Here's": 1, 'times': 1, 'print': 1, 'up,': 1, 'beside': 1, 'trying': 1, 'on': 1, 'try': 1, 'valid': 1, 'per': 1, 'minister': 1, 'file,': 1, 'saved': 1, 'single': 1, 'words': 1, 'SO': 1, 'prime': 1, 'speech': 1, 'word,': 1})
(请注意,此解决方案还不完美;它尚未从单词中删除逗号。提示;请使用 str.replace
。)
Counter
是一个特化的dict
,以单词为键,计数为值。所以你可以这样使用它:
cnts = Counter(txt)
with open('counts.txt', 'w') as outfile:
for c in counts:
outfile.write("{} {}\n".format(c, cnts[c]))
请注意,在此解决方案中,我使用了一些很好理解的 Python 概念;
- 一个context manager
- 迭代
dict
(这是一个 iterator) str.format