英语形态学软件

Morphology software for English

在我的应用程序中,我需要使用一款软件能够:a) 将单词转换为它们的基本形式和 b) 查找它们是否是 'nouns'、'verbs' 等

我找到了能够完成这项工作的软件列表。

http://aclweb.org/aclwiki/index.php?title=Morphology_software_for_English

有人对这些有任何经验吗?你推荐哪一个?

您可以使用 NLTK (Python) 来执行这些任务。

Find if they are 'nouns', 'verbs'...

此任务称为 Part-of-speech tagging. You can use the nltk.pos_tag function. (See the Peen Treebank tagset)

Convert the words to their basic forms

这个任务叫做lemmatization。您可以使用 nltk.stem.wordnet.WordNetLemmatizer.lemmatize 函数。

例子

import nltk
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet as wn

penn_to_wn = lambda penn_tag: {'NN':wn.NOUN,'JJ':wn.ADJ,'VB':wn.VERB,'RB':wn.ADV}.get(penn_tag[:2], wn.NOUN)

sentence = "The rabbits are eating in the garden."
tokens = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)
wl = WordNetLemmatizer()
lemmas = [wl.lemmatize(token, pos=penn_to_wn(tag)) for token, tag in pos_tags]

那么如果你打印结果:

>>> tokens
['The', 'rabbits', 'are', 'eating', 'in', 'the', 'garden', '.']

>>> pos_tags
[('The', 'DT'),
 ('rabbits', 'NNS'),
 ('are', 'VBP'),
 ('eating', 'VBG'),
 ('in', 'IN'),
 ('the', 'DT'),
 ('garden', 'NN'),
 ('.', '.')]

>>> lemmas
['The', u'rabbit', u'be', u'eat', 'in', 'the', 'garden', '.']