使用 Python 计算文本中的单词数(偶数倍数)
Count words (even multiples) in a text with Python
我必须编写一个函数来计算一个单词(或一系列单词)在给定文本中出现的次数。
到目前为止,这是我的职能。我注意到的是,对于一系列的 3 个词,函数运行良好,但对于 4 个词等则不然。
from nltk import ngrams
def function(text, word):
for char in ".?!-":
text = text.replace(char, ' ')
n = len(word.split())
countN = 0
bigram_lower = text.lower()
word_lower = word.lower()
n_grams = ngrams(bigram_lower.split(), n)
for gram in n_grams:
for i in range (0, n):
if gram[i] == word_lower.split()[i]:
countN = countN + 1
print (countN)
首先,请修正你的缩进,不要使用双字母作为 ngram 的变量,因为它有点混乱(因为你不只是在双字母变量中存储双字母)。其次让我们看看这部分代码 -
for gram in bigrams:
for i in range (0, n):
if gram[i] == word_lower.split()[i]:
countN = countN + 1
print (countN)
在这里,每次 ngram 中的一个单词匹配时,您将 countN
增加 1,而不是在整个 ngram 匹配时增加它。如果所有单词都匹配,您应该只增加 countN
-
for gram in bigrams:
if list(gram) == word_lower.split():
countN = countN + 1
print (countN)
可能已经在 here
完成了
nltk 是强制性的吗?
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
这应该适合你:
def function(text, word):
for char in ".?!-,":
text = text.replace(char, ' ')
n = len(word.split())
countN = 0
bigram_lower = text.lower()
word_lower = tuple(word.lower().split())
bigrams = nltk.ngrams(bigram_lower.split(), n)
for gram in bigrams:
if gram == word_lower:
countN += 1
print (countN)
>>> tekst="this is the text i want to search, i want to search it for the words i want to search for, and it should count the occurances of the words i want to search for"
>>> function(tekst, "i want to search")
4
>>> function(tekst, "i want to search for")
2
我必须编写一个函数来计算一个单词(或一系列单词)在给定文本中出现的次数。
到目前为止,这是我的职能。我注意到的是,对于一系列的 3 个词,函数运行良好,但对于 4 个词等则不然。
from nltk import ngrams
def function(text, word):
for char in ".?!-":
text = text.replace(char, ' ')
n = len(word.split())
countN = 0
bigram_lower = text.lower()
word_lower = word.lower()
n_grams = ngrams(bigram_lower.split(), n)
for gram in n_grams:
for i in range (0, n):
if gram[i] == word_lower.split()[i]:
countN = countN + 1
print (countN)
首先,请修正你的缩进,不要使用双字母作为 ngram 的变量,因为它有点混乱(因为你不只是在双字母变量中存储双字母)。其次让我们看看这部分代码 -
for gram in bigrams:
for i in range (0, n):
if gram[i] == word_lower.split()[i]:
countN = countN + 1
print (countN)
在这里,每次 ngram 中的一个单词匹配时,您将 countN
增加 1,而不是在整个 ngram 匹配时增加它。如果所有单词都匹配,您应该只增加 countN
-
for gram in bigrams:
if list(gram) == word_lower.split():
countN = countN + 1
print (countN)
可能已经在 here
完成了nltk 是强制性的吗?
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
这应该适合你:
def function(text, word):
for char in ".?!-,":
text = text.replace(char, ' ')
n = len(word.split())
countN = 0
bigram_lower = text.lower()
word_lower = tuple(word.lower().split())
bigrams = nltk.ngrams(bigram_lower.split(), n)
for gram in bigrams:
if gram == word_lower:
countN += 1
print (countN)
>>> tekst="this is the text i want to search, i want to search it for the words i want to search for, and it should count the occurances of the words i want to search for"
>>> function(tekst, "i want to search")
4
>>> function(tekst, "i want to search for")
2