"Resource punkt not found" 即使在手动将数据文件复制到位之后

Question

我正在尝试在未连接到 Internet 的系统上对文本进行标记化。通过将 *.xml 和 *.zip 复制到 nltk_data\corpora 文件夹中，我成功地使用了其他 nltk 数据（如停用词）。但是对于朋克来说，这是行不通的。

我在 "tokenizers" 文件夹中的 Anaconda 发行版中找到了它，该文件夹与 "corpora" 文件夹处于同一级别，并尝试模仿它——运气不好。
我已经尝试将 punkt.xml 和 punkt.zip 文件复制到解释器所说的它正在尝试定位文件的所有位置 - 运气不好。

我知道这有点 hacky，但这是一个离线环境，复制任何内容的能力非常有限，所以我需要使用我已有的东西。

重现此代码的最短代码如下所示：

from nltk.tokenize import word_tokenize
words = word_tokenize('some text here')

当我运行代码（在 Spyder 中）时，我得到了这个：

LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt')

  Searched in:
    - 'C:\Users\jchase/nltk_data'
    - 'C:\nltk_data'
    - 'D:\nltk_data'
    - 'E:\nltk_data'
    - 'C:\Anaconda3\nltk_data'
    - 'C:\Anaconda3\lib\nltk_data'
    - 'C:\Users\jchase\AppData\Roaming\nltk_data
**********************************************************************

（是的，我是从头开始输入的）。

Answer 1

很抱歉打扰大家，但我找到了答案。我将错误追溯到 nltk 发行版中的 data.py，它使用的路径是：

def _open(resource_url)
    ...

我检查了这个 resource_url 并发现它正在寻找的 file/path 是：

nltk:tokenizers/punkt/english.pickle

所以我将 punkt.zip 解压缩到文件夹 tokenizers/punkt 中（完整路径是 C:\Anaconda3\Lib\nltk_data\tokenizers\punkt），这让我克服了错误。

这个问题和我的回答可能会导致 Python 世界的专业人士对我的环境似乎存在一些问题发表评论。对此，我同意。这是我没有构建的虚拟机，没有太多控制权。

再次抱歉误报。我应该让它躺到早上一切都新鲜的时候。

"Resource punkt not found" 即使在手动将数据文件复制到位之后

"Resource punkt not found" even after manually copying data files into place

python

nltk