从文本文件中读取并将词频保存到新的文本文件中,每行打印一次

Read from text file and save word frequency to new text file printing each on a new line

美好的一天。请需要帮助。使用的语言是 python。下面的代码从一个文本文件中读取,然后 returns 每个单词在新行中出现的频率。我从这个网站得到它 https://rmtheis.wordpress.com/2012/09/26/count-word-frequency-with-python/

import re
from collections import Counter


def openfile(filename):
    fh = open(filename, "r+")
    str = fh.read()
    fh.close()
    return str


def removegarbage(str):
    # Replace one or more non-word (non-alphanumeric) chars with a space
    str = re.sub(r'\W+', ' ', str)
    str = str.lower()
    return str


def getwordbins(words):
    cnt = Counter()
    for word in words:
        cnt[word] += 1
    return cnt


def main(filename, topwords):
    txt = openfile(filename)
    txt = removegarbage(txt)
    words = txt.split(' ')
    bins = getwordbins(words)
    for key, value in bins.most_common(topwords):

        print(key, value)

main('hamlet.txt', 500)

从上面可以看出,它在我使用的 IDE (pyCharm) 上打印良好。但是当我在上面的代码下面添加以下代码时,

#Write to file
    with open("newFile.txt", "w") as f:
        for word in main('hamlet.txt', 500):
            f.write(word + os.linesep)

它在 控制台 上打印正常,但 显示一些错误 ,而且它 没有写入文本我创建的文件。 .下面是一个片段,显示了读取文本文件后控制台上的示例输出,它打印:

the 16
of 12
to 9
search 9
which 6

所以现在,我想将上面的输出写成一个文本 file.The textile 的内容比上面的要长得多。谢谢你。顺便说一下,控制台上的错误是

    Traceback (most recent call last):
  File "/Users/test/PycharmProjects/Trial/trial.py", line 52, in <module>
    for word in main("hamlet.txt", 500):
TypeError: 'NoneType' object is not iterable

如果你想使用如图所示的函数main,即

for word in main('hamlet.txt', 500):

那么函数应该适应这个。 可以使用例如发电机:

def main(filename, topwords):
    txt = openfile(filename)
    txt = removegarbage(txt)
    words = txt.split(' ')
    bins = getwordbins(words)
    for key, value in bins.most_common(topwords):
        # yield key #generate only the word, not it's frequency
        yield key, value

with open("newFile.txt", "w") as f:
    for word, freq in main('hamlet.txt', 500):
        f.write('%s\t%d\n' % (word, freq))

您需要 return key, value 而不是打印它