如何计算文本中特定句子的出现?
How to calculate the occurrence of specific sentence in a text?
我如何使用下面的代码来计算二元语法在 example_txt 中出现的频率?
现在我想我会返回 'order' 是否出现在 total 变量中。我想计算总的bigram百分比。
假设我们对总计进行二元语法分析,结果如下:
[('order', 'intake'), ('intake', 'is'), ('is', 'strong'), ('strong', 'for'), ('for', 'q4')]
意思是,我的代码的输出应该是 0.20,因为 'order intake' 是 1/5.
from nltk import ngrams
example_txt = "order intake is strong for q4"
bi_gram = 'order intake'
#these turns example_txt and bi_gram into bigrams
n_gram_text = ngrams(example_txt.split(), 2)
n_gram = ngrams(bi_gram.split(), 2)
#this is used for extracintg and appending to total and bigram
total =[]
bigram = []
for e in n_gram_text:
total.append(e)
for i in n_gram:
bigram.append(i)
#this is supposed to return if bigram exists in total.
for k in bigram:
for total in k:
if t in total:
print('yes')
print(k)
else:
print(t)
编辑:新标题
您可以使用集合模块中的计数器:
from collections import Counter
bigram = ('order', 'intake')
counter_total = Counter(total)
perc_bigram = counter_total[bigram] / sum(counter_total.values())
perc_bigram
输出:
0.2
我如何使用下面的代码来计算二元语法在 example_txt 中出现的频率? 现在我想我会返回 'order' 是否出现在 total 变量中。我想计算总的bigram百分比。
假设我们对总计进行二元语法分析,结果如下: [('order', 'intake'), ('intake', 'is'), ('is', 'strong'), ('strong', 'for'), ('for', 'q4')] 意思是,我的代码的输出应该是 0.20,因为 'order intake' 是 1/5.
from nltk import ngrams
example_txt = "order intake is strong for q4"
bi_gram = 'order intake'
#these turns example_txt and bi_gram into bigrams
n_gram_text = ngrams(example_txt.split(), 2)
n_gram = ngrams(bi_gram.split(), 2)
#this is used for extracintg and appending to total and bigram
total =[]
bigram = []
for e in n_gram_text:
total.append(e)
for i in n_gram:
bigram.append(i)
#this is supposed to return if bigram exists in total.
for k in bigram:
for total in k:
if t in total:
print('yes')
print(k)
else:
print(t)
编辑:新标题
您可以使用集合模块中的计数器:
from collections import Counter
bigram = ('order', 'intake')
counter_total = Counter(total)
perc_bigram = counter_total[bigram] / sum(counter_total.values())
perc_bigram
输出:
0.2