使用 python textblob 库标记器时出错
Error when using python textblob library tagger
我让 textblob 库工作了一段时间,但决定安装(使用 easy_install)一个额外的库(page here),声称标记速度更快、更准确。
我无法让它工作所以我卸载了它,但它似乎与 TextBlob 中的标记功能混淆了。我已经使用 pip 和 easy_install 卸载并重新安装了 nltk 和 TextBlob 无数次,并确保它们是最新的。
这是一个生成错误的简单脚本示例:
from textblob import TextBlob
blob = TextBlob("This is a sentence")
print repr(blob.tags)
并打印错误:
Traceback (most recent call last):
File "tesst.py", line 5, in <module>
print repr(blob.tags)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 24, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\blob.py", line 445, in pos_tags
for word, t in self.pos_tagger.tag(self.raw)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 35, in decorated
return func(*args, **kwargs)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\en\taggers.py", line 34, in tag
tagged = nltk.tag.pos_tag(text)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 454, in _open
'unknown_open', req)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 1265, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>
可以看到错误其实是提到了perceptron tagger。有什么方法可以更彻底地删除可能对备用标记器的任何引用?
另请注意,只有 "tags" 功能受到影响。
我发现了我在使用 ap 标记器时遇到问题的原因。 My issue is solved here. 更具体地说,通过注释 "Another option is to install nltk and then change "from textblob.packages import nltk" 到 "import nltk" [在 taggers.py] 文件中。"
(请注意,这与上面的错误消息不对应:该错误是在没有安装 aptagger 的情况下出现的。我收到另一个错误 with 它安装了,这是一个解决方案。)
这似乎是 nltk 3.2 版的问题。在发布中修复之前,您可以使用此 hack:
我让 textblob 库工作了一段时间,但决定安装(使用 easy_install)一个额外的库(page here),声称标记速度更快、更准确。
我无法让它工作所以我卸载了它,但它似乎与 TextBlob 中的标记功能混淆了。我已经使用 pip 和 easy_install 卸载并重新安装了 nltk 和 TextBlob 无数次,并确保它们是最新的。
这是一个生成错误的简单脚本示例:
from textblob import TextBlob
blob = TextBlob("This is a sentence")
print repr(blob.tags)
并打印错误:
Traceback (most recent call last):
File "tesst.py", line 5, in <module>
print repr(blob.tags)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 24, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\blob.py", line 445, in pos_tags
for word, t in self.pos_tagger.tag(self.raw)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\decorators.py", line 35, in decorated
return func(*args, **kwargs)
File "C:\Users\Emmet\Anaconda\lib\site-packages\textblob\en\taggers.py", line 34, in tag
tagged = nltk.tag.pos_tag(text)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\tag\perceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 801, in load
opened_resource = _open(resource_url)
File "C:\Users\Emmet\Anaconda\lib\site-packages\nltk\data.py", line 924, in _open
return urlopen(resource_url)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 454, in _open
'unknown_open', req)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Users\Emmet\Anaconda\lib\urllib2.py", line 1265, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>
可以看到错误其实是提到了perceptron tagger。有什么方法可以更彻底地删除可能对备用标记器的任何引用?
另请注意,只有 "tags" 功能受到影响。
我发现了我在使用 ap 标记器时遇到问题的原因。 My issue is solved here. 更具体地说,通过注释 "Another option is to install nltk and then change "from textblob.packages import nltk" 到 "import nltk" [在 taggers.py] 文件中。"
(请注意,这与上面的错误消息不对应:该错误是在没有安装 aptagger 的情况下出现的。我收到另一个错误 with 它安装了,这是一个解决方案。)
这似乎是 nltk 3.2 版的问题。在发布中修复之前,您可以使用此 hack: