如何在 spacy 管道中使用 lemmatiser 内置的 spacys?
How to use spacys built in lemmatiser in a spacy pipeline?
我想使用词形还原,但我无法在文档中直接看到如何在管道中使用内置词形还原的 Spacys。
在文档 for the lemmatiser 中,它说:
Initialize a Lemmatizer. Typically, this happens under the hood within spaCy when a Language
subclass and its Vocab
is initialized.
这是否意味着内置词形还原过程是管道中未提及的部分?
It's mentioned in the docs as part of the pipeline subheading
而在 docs for the pipeline usage 中只提到了 "custom lemmatisation" 以及如何使用它。
这是各种相互矛盾的信息。
Does this mean the build in lemmatisation process is an unmentioned part of the pipeline?
简单地说,是的。当加载 Language
和 Vocab
时加载 Lemmatizer。
用法示例:
import spacy
nlp=spacy.load('en_core_web_sm')
doc= nlp(u"Apples and oranges are similar. Boots and hippos aren't.")
print('\n')
print("Token Attributes: \n", "token.text, token.pos_, token.tag_, token.dep_, token.lemma_")
for token in doc:
# Print the text and the predicted part-of-speech tag
print("{:<12}{:<12}{:<12}{:<12}{:<12}".format(token.text, token.pos_, token.tag_, token.dep_, token.lemma_))
输出:
Token Attributes:
token.text, token.pos_, token.tag_, token.dep_, token.lemma_
Apples NOUN NNS nsubj apple
and CCONJ CC cc and
oranges NOUN NNS conj orange
are AUX VBP ROOT be
similar ADJ JJ acomp similar
. PUNCT . punct .
Boots NOUN NNS nsubj boot
and CCONJ CC cc and
hippos NOUN NN conj hippos
are AUX VBP ROOT be
n't PART RB neg not
. PUNCT . punct .
同时查看 this 线程,其中有一些关于词形还原速度的有趣信息。
我想使用词形还原,但我无法在文档中直接看到如何在管道中使用内置词形还原的 Spacys。
在文档 for the lemmatiser 中,它说:
Initialize a Lemmatizer. Typically, this happens under the hood within spaCy when a
Language
subclass and itsVocab
is initialized.
这是否意味着内置词形还原过程是管道中未提及的部分?
It's mentioned in the docs as part of the pipeline subheading
而在 docs for the pipeline usage 中只提到了 "custom lemmatisation" 以及如何使用它。
这是各种相互矛盾的信息。
Does this mean the build in lemmatisation process is an unmentioned part of the pipeline?
简单地说,是的。当加载 Language
和 Vocab
时加载 Lemmatizer。
用法示例:
import spacy
nlp=spacy.load('en_core_web_sm')
doc= nlp(u"Apples and oranges are similar. Boots and hippos aren't.")
print('\n')
print("Token Attributes: \n", "token.text, token.pos_, token.tag_, token.dep_, token.lemma_")
for token in doc:
# Print the text and the predicted part-of-speech tag
print("{:<12}{:<12}{:<12}{:<12}{:<12}".format(token.text, token.pos_, token.tag_, token.dep_, token.lemma_))
输出:
Token Attributes:
token.text, token.pos_, token.tag_, token.dep_, token.lemma_
Apples NOUN NNS nsubj apple
and CCONJ CC cc and
oranges NOUN NNS conj orange
are AUX VBP ROOT be
similar ADJ JJ acomp similar
. PUNCT . punct .
Boots NOUN NNS nsubj boot
and CCONJ CC cc and
hippos NOUN NN conj hippos
are AUX VBP ROOT be
n't PART RB neg not
. PUNCT . punct .
同时查看 this 线程,其中有一些关于词形还原速度的有趣信息。