如何通过NLTK提取我想要的信息
how to extract information I want by NLKT
我想提取几个主题的相关信息。例如:
- 产品信息
- 客户的购买体验
- 家人或朋友推荐
第一步,我从其中一个网站提取信息。例如:
i think AIA does a more better life insurance as my comparison and
the companies comparisonand most important is also medical insurance
in my opinionyes there are some agents that will sell u plans that
their commission is high...dun worry u buy insurance from a company
anything happens u can contact back the company also can ...better
find a agent that is reliable and not just working for the commission
for now , they might not service u in the future...thanksregardsdiana
""
然后在VS2015中使用NLTK,尝试了分词
toks = nltk.word_tokenize(text)
通过使用 pos_tag 我可以标记我的 toks
postoks = nltk.tag.pos_tag(toks)
从这部分我不确定我应该怎么做?
以前,我使用 IBM text Analytic。在这个软件中,我用来创建字典,然后创建一些模式,然后分析数据。例如
:
Sample of Dictionary: insurance_cmp : {AIA, IMG, SABB}
Sample of pattern:
insurance_cmp + Good_Feeling_Pattern
insurance_cmp + ['purchase|Buy'] + Bad_Feeling_Pattern
Good_Feeling_Pattern = [good, like it, nice]
Bad_Feeling_Pattern = [bad, worse, not good, regret]
我想知道我可以在 NLKT 中模拟相同的内容吗? chunker 和 create grammar 可以帮助我提取我要找的东西吗?请问您有什么提高自己的想法吗?
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
chunker = nltk.RegexpParser(grammar)
tree = chunker.parse(postoks)
请帮助我实现目标的下一步是什么?
我想提取几个主题的相关信息。例如:
- 产品信息
- 客户的购买体验
- 家人或朋友推荐
第一步,我从其中一个网站提取信息。例如:
i think AIA does a more better life insurance as my comparison and the companies comparisonand most important is also medical insurance in my opinionyes there are some agents that will sell u plans that their commission is high...dun worry u buy insurance from a company anything happens u can contact back the company also can ...better find a agent that is reliable and not just working for the commission for now , they might not service u in the future...thanksregardsdiana ""
然后在VS2015中使用NLTK,尝试了分词
toks = nltk.word_tokenize(text)
通过使用 pos_tag 我可以标记我的 toks
postoks = nltk.tag.pos_tag(toks)
从这部分我不确定我应该怎么做? 以前,我使用 IBM text Analytic。在这个软件中,我用来创建字典,然后创建一些模式,然后分析数据。例如 :
Sample of Dictionary: insurance_cmp : {AIA, IMG, SABB}
Sample of pattern:
insurance_cmp + Good_Feeling_Pattern
insurance_cmp + ['purchase|Buy'] + Bad_Feeling_Pattern
Good_Feeling_Pattern = [good, like it, nice]
Bad_Feeling_Pattern = [bad, worse, not good, regret]
我想知道我可以在 NLKT 中模拟相同的内容吗? chunker 和 create grammar 可以帮助我提取我要找的东西吗?请问您有什么提高自己的想法吗?
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
chunker = nltk.RegexpParser(grammar)
tree = chunker.parse(postoks)
请帮助我实现目标的下一步是什么?