从文本文件中读取并将词频保存到新的文本文件中,每行打印一次
Read from text file and save word frequency to new text file printing each on a new line
美好的一天。请需要帮助。使用的语言是 python。下面的代码从一个文本文件中读取,然后 returns 每个单词在新行中出现的频率。我从这个网站得到它 https://rmtheis.wordpress.com/2012/09/26/count-word-frequency-with-python/
import re
from collections import Counter
def openfile(filename):
fh = open(filename, "r+")
str = fh.read()
fh.close()
return str
def removegarbage(str):
# Replace one or more non-word (non-alphanumeric) chars with a space
str = re.sub(r'\W+', ' ', str)
str = str.lower()
return str
def getwordbins(words):
cnt = Counter()
for word in words:
cnt[word] += 1
return cnt
def main(filename, topwords):
txt = openfile(filename)
txt = removegarbage(txt)
words = txt.split(' ')
bins = getwordbins(words)
for key, value in bins.most_common(topwords):
print(key, value)
main('hamlet.txt', 500)
从上面可以看出,它在我使用的 IDE (pyCharm) 上打印良好。但是当我在上面的代码下面添加以下代码时,
#Write to file
with open("newFile.txt", "w") as f:
for word in main('hamlet.txt', 500):
f.write(word + os.linesep)
它在 控制台 上打印正常,但 显示一些错误 ,而且它 没有写入文本我创建的文件。 .下面是一个片段,显示了读取文本文件后控制台上的示例输出,它打印:
the 16
of 12
to 9
search 9
which 6
所以现在,我想将上面的输出写成一个文本 file.The textile 的内容比上面的要长得多。谢谢你。顺便说一下,控制台上的错误是
Traceback (most recent call last):
File "/Users/test/PycharmProjects/Trial/trial.py", line 52, in <module>
for word in main("hamlet.txt", 500):
TypeError: 'NoneType' object is not iterable
如果你想使用如图所示的函数main
,即
for word in main('hamlet.txt', 500):
那么函数应该适应这个。
可以使用例如发电机:
def main(filename, topwords):
txt = openfile(filename)
txt = removegarbage(txt)
words = txt.split(' ')
bins = getwordbins(words)
for key, value in bins.most_common(topwords):
# yield key #generate only the word, not it's frequency
yield key, value
with open("newFile.txt", "w") as f:
for word, freq in main('hamlet.txt', 500):
f.write('%s\t%d\n' % (word, freq))
您需要 return key, value
而不是打印它
美好的一天。请需要帮助。使用的语言是 python。下面的代码从一个文本文件中读取,然后 returns 每个单词在新行中出现的频率。我从这个网站得到它 https://rmtheis.wordpress.com/2012/09/26/count-word-frequency-with-python/
import re
from collections import Counter
def openfile(filename):
fh = open(filename, "r+")
str = fh.read()
fh.close()
return str
def removegarbage(str):
# Replace one or more non-word (non-alphanumeric) chars with a space
str = re.sub(r'\W+', ' ', str)
str = str.lower()
return str
def getwordbins(words):
cnt = Counter()
for word in words:
cnt[word] += 1
return cnt
def main(filename, topwords):
txt = openfile(filename)
txt = removegarbage(txt)
words = txt.split(' ')
bins = getwordbins(words)
for key, value in bins.most_common(topwords):
print(key, value)
main('hamlet.txt', 500)
从上面可以看出,它在我使用的 IDE (pyCharm) 上打印良好。但是当我在上面的代码下面添加以下代码时,
#Write to file
with open("newFile.txt", "w") as f:
for word in main('hamlet.txt', 500):
f.write(word + os.linesep)
它在 控制台 上打印正常,但 显示一些错误 ,而且它 没有写入文本我创建的文件。 .下面是一个片段,显示了读取文本文件后控制台上的示例输出,它打印:
the 16
of 12
to 9
search 9
which 6
所以现在,我想将上面的输出写成一个文本 file.The textile 的内容比上面的要长得多。谢谢你。顺便说一下,控制台上的错误是
Traceback (most recent call last):
File "/Users/test/PycharmProjects/Trial/trial.py", line 52, in <module>
for word in main("hamlet.txt", 500):
TypeError: 'NoneType' object is not iterable
如果你想使用如图所示的函数main
,即
for word in main('hamlet.txt', 500):
那么函数应该适应这个。 可以使用例如发电机:
def main(filename, topwords):
txt = openfile(filename)
txt = removegarbage(txt)
words = txt.split(' ')
bins = getwordbins(words)
for key, value in bins.most_common(topwords):
# yield key #generate only the word, not it's frequency
yield key, value
with open("newFile.txt", "w") as f:
for word, freq in main('hamlet.txt', 500):
f.write('%s\t%d\n' % (word, freq))
您需要 return key, value
而不是打印它