使用 NLTK 自定义 POS 标记(错误)
Custom POS tagging with NLTK (error)
我正在尝试将我自己的简单自定义标记器与 nltk 默认标记器结合起来,在本例中为感知器标记器。
我的代码如下(基于this answer):
import nltk.tag, nltk.data
default_tagger = nltk.data.load(nltk.tag._POS_TAGGER)
model = {'example_one': 'VB' 'example_two': 'NN'}
tagger = nltk.tag.UnigramTagger(model=model, backoff=default_tagger)
然而,这会产生以下错误:
File "nltk_test.py", line 24, in <module>
default_tagger = nltk.data.load(nltk.tag._POS_TAGGER)
AttributeError: 'module' object has no attribute '_POS_TAGGER'
我试图通过将默认标记器更改为来解决此问题:
from nltk.tag.perceptron import PerceptronTagger
default_tagger = PerceptronTagger()
但随后出现以下错误:
File "nltk_test.py", line 26, in <module>
tagger = nltk.tag.UnigramTagger(model=model, backoff=default_tagger)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 340, in __init__
backoff, cutoff, verbose)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 284, in __init__
ContextTagger.__init__(self, model, backoff)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 125, in __init__
SequentialBackoffTagger.__init__(self, backoff)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 50, in __init__
self._taggers = [self] + backoff._taggers
AttributeError: 'PerceptronTagger' object has no attribute '_taggers'
翻阅nltk.tag
documentation好像_POS_TAGGER
已经不存在了。但是将其更改为 _pos_tag
或 pos_tag
也没有用。
快速回答:现在使用 nltk 3.0.1 pip install nltk==3.0.1
更好的答案:他们在去年 9 月更改了 treebank 标记器,它有很多其他的影响(我们目前固定在 3.0.1 上,因为新标记器至少对我们的需求来说更差)。
这似乎有效,但我不确定代码的正确性:
class BackoffTagger:
def __init__(self):
self._taggers = [PerceptronTagger()]
model = {'example_one': 'VB', 'example_two': 'NN'}
tagger = nltk.tag.UnigramTagger(model=model, backoff=BackoffTagger())
tagger.tag(['example_one'])
>>> [('example_one', 'VB')]
尝试以下自定义标记:
import nltk.tag, nltk.data
from nltk.tag.perceptron import PerceptronTagger
default_tagger = PerceptronTagger()
使用自定义标签定义您的模型:
model={"paining": "Reaction", "Itching":"Reaction", "Removed":"Reaction", "skin":"site"}
class BackoffTagger:
def __init__(self):
self._taggers = [PerceptronTagger()]
tagger = nltk.tag.UnigramTagger(model=model, backoff=BackoffTagger())
tagger.tag(['skin'])
输出:
[('skin', 'site')]
我正在尝试将我自己的简单自定义标记器与 nltk 默认标记器结合起来,在本例中为感知器标记器。
我的代码如下(基于this answer):
import nltk.tag, nltk.data
default_tagger = nltk.data.load(nltk.tag._POS_TAGGER)
model = {'example_one': 'VB' 'example_two': 'NN'}
tagger = nltk.tag.UnigramTagger(model=model, backoff=default_tagger)
然而,这会产生以下错误:
File "nltk_test.py", line 24, in <module>
default_tagger = nltk.data.load(nltk.tag._POS_TAGGER)
AttributeError: 'module' object has no attribute '_POS_TAGGER'
我试图通过将默认标记器更改为来解决此问题:
from nltk.tag.perceptron import PerceptronTagger
default_tagger = PerceptronTagger()
但随后出现以下错误:
File "nltk_test.py", line 26, in <module>
tagger = nltk.tag.UnigramTagger(model=model, backoff=default_tagger)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 340, in __init__
backoff, cutoff, verbose)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 284, in __init__
ContextTagger.__init__(self, model, backoff)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 125, in __init__
SequentialBackoffTagger.__init__(self, backoff)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/tag/sequential.py", line 50, in __init__
self._taggers = [self] + backoff._taggers
AttributeError: 'PerceptronTagger' object has no attribute '_taggers'
翻阅nltk.tag
documentation好像_POS_TAGGER
已经不存在了。但是将其更改为 _pos_tag
或 pos_tag
也没有用。
快速回答:现在使用 nltk 3.0.1 pip install nltk==3.0.1
更好的答案:他们在去年 9 月更改了 treebank 标记器,它有很多其他的影响(我们目前固定在 3.0.1 上,因为新标记器至少对我们的需求来说更差)。
这似乎有效,但我不确定代码的正确性:
class BackoffTagger:
def __init__(self):
self._taggers = [PerceptronTagger()]
model = {'example_one': 'VB', 'example_two': 'NN'}
tagger = nltk.tag.UnigramTagger(model=model, backoff=BackoffTagger())
tagger.tag(['example_one'])
>>> [('example_one', 'VB')]
尝试以下自定义标记:
import nltk.tag, nltk.data
from nltk.tag.perceptron import PerceptronTagger
default_tagger = PerceptronTagger()
使用自定义标签定义您的模型:
model={"paining": "Reaction", "Itching":"Reaction", "Removed":"Reaction", "skin":"site"}
class BackoffTagger:
def __init__(self):
self._taggers = [PerceptronTagger()]
tagger = nltk.tag.UnigramTagger(model=model, backoff=BackoffTagger())
tagger.tag(['skin'])
输出:
[('skin', 'site')]