在 py3 中使用 nltk 时得到 "bad escape"

Getting "bad escape" when using nltk in py3

NLTK 版本 3.4.5。 Python 3.7.4。 OSX 版本 10.14.5.

从 2.7 升级代码库,运行 刚刚开始进入这个问题。我已经在新的 virtualenv 中重新安装了所有包和扩展的无缓存重新安装。很困惑这怎么可能只发生在我身上,我在网上找不到其他人有同样的错误。

(venv3) gmoss$ python
Python 3.7.4 (default, Sep  7 2019, 18:27:02) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/site-packages/nltk/__init__.py", line 150, in <module>
    from nltk.translate import *
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/site-packages/nltk/translate/__init__.py", line 23, in <module>
    from nltk.translate.meteor_score import meteor_score as meteor
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/site-packages/nltk/translate/meteor_score.py", line 10, in <module>
    from nltk.stem.porter import PorterStemmer
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/site-packages/nltk/stem/__init__.py", line 29, in <module>
    from nltk.stem.snowball import SnowballStemmer
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/site-packages/nltk/stem/snowball.py", line 314, in <module>
    class ArabicStemmer(_StandardStemmer):
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/site-packages/nltk/stem/snowball.py", line 326, in ArabicStemmer
    r'[\u064b-\u064c-\u064d-\u064e-\u064f-\u0650-\u0651-\u0652]'
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/re.py", line 234, in compile
    return _compile(pattern, flags)
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/re.py", line 286, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/sre_parse.py", line 930, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/sre_parse.py", line 426, in _parse_sub
    not nested and not items))
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/sre_parse.py", line 536, in _parse
    code1 = _class_escape(source, this)
  File "/Users/gmoss/Documents/constructor/autocomplete/venv3/lib/python3.7/sre_parse.py", line 337, in _class_escape
    raise source.error('bad escape %s' % escape, len(escape))
re.error: bad escape \u at position 1

Python 正则表达式不支持 \u 转义,如错误消息所述。

奇怪的是错误来自 nltk 包。该包的作者肯定知道如何编写正则表达式。您是否不小心选择了 Python 2.7 版本的 nltk 软件包,即使它位于您的 3.7 目录中的 Kaminstaller?

我希望 nltk 包的所有代码都有单元测试。我会针对该软件包提交错误报告。

万一其他人遇到这个问题,降级到 3.4.2 可以解决这个问题,因为这是在将 ArabicStemmer 引入相关文件之前。我已经打开了 nltk 的问题,希望它能得到解决。

为了跟进,这是一个错误的警报:一个错误的清理脚本正在删除我的虚拟环境中的 NLTK 共享对象文件,我猜它正在回退到其他版本。