使用 Python 计算文本中的单词数（偶数倍数）

Question

我必须编写一个函数来计算一个单词（或一系列单词）在给定文本中出现的次数。

到目前为止，这是我的职能。我注意到的是，对于一系列的 3 个词，函数运行良好，但对于 4 个词等则不然。

from nltk import ngrams

def function(text, word):
    for char in ".?!-":
      text = text.replace(char, ' ')

    n = len(word.split())
    countN = 0
    bigram_lower = text.lower()
    word_lower = word.lower()

    n_grams = ngrams(bigram_lower.split(), n)

    for gram in n_grams:
        for i in range (0, n):
            if gram[i] == word_lower.split()[i]: 
                countN = countN + 1

    print (countN)

Answer 1

首先，请修正你的缩进，不要使用双字母作为 ngram 的变量，因为它有点混乱（因为你不只是在双字母变量中存储双字母）。其次让我们看看这部分代码 -

for gram in bigrams:
    for i in range (0, n):
        if gram[i] == word_lower.split()[i]: 
            countN = countN + 1

print (countN)

在这里，每次 ngram 中的一个单词匹配时，您将 countN 增加 1，而不是在整个 ngram 匹配时增加它。如果所有单词都匹配，您应该只增加 countN -

for gram in bigrams:
    if list(gram) == word_lower.split(): 
        countN = countN + 1

print (countN)

Answer 2

可能已经在 here

完成了

nltk 是强制性的吗？

# Open the file in read mode 
text = open("sample.txt", "r") 

# Create an empty dictionary 
d = dict() 

# Loop through each line of the file 
for line in text: 
    # Remove the leading spaces and newline character 
    line = line.strip() 

    # Convert the characters in line to  
    # lowercase to avoid case mismatch 
    line = line.lower() 

    # Split the line into words 
    words = line.split(" ") 

    # Iterate over each word in line 
    for word in words: 
        # Check if the word is already in dictionary 
        if word in d: 
            # Increment count of word by 1 
            d[word] = d[word] + 1
        else: 
            # Add the word to dictionary with count 1 
            d[word] = 1

# Print the contents of dictionary 
for key in list(d.keys()): 
    print(key, ":", d[key])

Answer 3

这应该适合你：


def function(text, word):
    for char in ".?!-,":
        text = text.replace(char, ' ')
    n = len(word.split())
    countN = 0
    bigram_lower = text.lower()
    word_lower = tuple(word.lower().split())
    bigrams = nltk.ngrams(bigram_lower.split(), n)
    for gram in bigrams:
        if gram == word_lower: 
                countN += 1
    print (countN)

>>> tekst="this is the text i want to search, i want to search it for the words i want to search for, and it should count the occurances of the words i want to search for"
>>> function(tekst, "i want to search")
4

>>> function(tekst, "i want to search for")
2

使用 Python 计算文本中的单词数（偶数倍数）

Count words (even multiples) in a text with Python

python

nltk

python-3.x