使用 gensim 的短语获取 trigrams 时出错
Error getting trigrams using gensim's Phrases
我想提取给定句子的所有二元组和三元组。
from gensim.models import Phrases
documents = ["the mayor of new york was there", "Human Computer Interaction is a great and new subject", "machine learning can be useful sometimes","new york mayor was present", "I love machine learning because it is a new subject area", "human computer interaction helps people to get user friendly applications"]
sentence_stream = [doc.split(" ") for doc in documents]
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
trigram = Phrases(bigram(sentence_stream, min_count=1, threshold=2, delimiter=b' '))
for sent in sentence_stream:
#print(sent)
bigrams_ = bigram[sent]
trigrams_ = trigram[bigrams_]
print(bigrams_)
print(trigrams_)
该代码适用于二元语法并捕获 'new york' 和 'machine learning' 广告二元语法。
但是,当我尝试插入八卦时出现以下错误。
TypeError: 'Phrases' object is not callable
请告诉我如何更正我的代码。
我正在关注gensim的example documentation
根据docs,你可以这样做:
from gensim.models import Phrases
from gensim.models.phrases import Phraser
phrases = Phrases(sentence_stream)
bigram = Phraser(phrases)
trigram = Phrases(bigram[sentence_stream])
bigram
,作为一个 Phrases
对象,无法再次调用,因为您正在这样做。
我想提取给定句子的所有二元组和三元组。
from gensim.models import Phrases
documents = ["the mayor of new york was there", "Human Computer Interaction is a great and new subject", "machine learning can be useful sometimes","new york mayor was present", "I love machine learning because it is a new subject area", "human computer interaction helps people to get user friendly applications"]
sentence_stream = [doc.split(" ") for doc in documents]
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
trigram = Phrases(bigram(sentence_stream, min_count=1, threshold=2, delimiter=b' '))
for sent in sentence_stream:
#print(sent)
bigrams_ = bigram[sent]
trigrams_ = trigram[bigrams_]
print(bigrams_)
print(trigrams_)
该代码适用于二元语法并捕获 'new york' 和 'machine learning' 广告二元语法。
但是,当我尝试插入八卦时出现以下错误。
TypeError: 'Phrases' object is not callable
请告诉我如何更正我的代码。
我正在关注gensim的example documentation
根据docs,你可以这样做:
from gensim.models import Phrases
from gensim.models.phrases import Phraser
phrases = Phrases(sentence_stream)
bigram = Phraser(phrases)
trigram = Phrases(bigram[sentence_stream])
bigram
,作为一个 Phrases
对象,无法再次调用,因为您正在这样做。