访问 semcor.tagged_sents() 中的树时出错

Question

我正在使用来自 NLTK 的 semcor.tagged_sents() 模块。

nltk.download('semcor')
from nltk.corpus import semcor

Semcor.sents() 遍历表示为标记列表的所有句子：

print(semcor.sents()[0])
>>> ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', 'Atlanta', "'s", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities', 'took', 'place', '.']

并且 semcor.tagged_sents() 使用附加注释（包括 WordNet 词条标识符）迭代相同的句子。

semcor.tagged_sents(tag="sem")[0]
>>> [['The'],
 Tree(Lemma('group.n.01.group'), [Tree('NE', ['Fulton', 'County', 'Grand', 'Jury'])]),
 Tree(Lemma('state.v.01.say'), ['said']),
 Tree(Lemma('friday.n.01.Friday'), ['Friday']),
 ['an'],
 Tree(Lemma('probe.n.01.investigation'), ['investigation']),
 ['of'],
 Tree(Lemma('atlanta.n.01.Atlanta'), ['Atlanta']),
 ["'s"],
 Tree(Lemma('late.s.03.recent'), ['recent']),
 Tree(Lemma('primary.n.01.primary_election'), ['primary', 'election']),
 Tree(Lemma('produce.v.04.produce'), ['produced']),
 ['``'],
 ['no'],
 Tree(Lemma('evidence.n.01.evidence'), ['evidence']),
 ["''"],
 ['that'],
 ['any'],
 Tree(Lemma('abnormality.n.04.irregularity'), ['irregularities']),
 Tree(Lemma('happen.v.01.take_place'), ['took', 'place']),
 ['.']]

我的目标是创建一个函数，将来自 SemCor 的句子作为输入并提取一个列表，该列表对于句子的每个标记都包含相应的 WordNet 引理（例如引理（'friday.n.01.Friday'））或 None.

现在，我想访问上面最后一个列表中的第二个元素 (Tree(Lemma('group.n.01.group'), [Tree('NE', ['Fulton', 'County', 'Grand', 'Jury'])]))。但是，当我运行:

semcor.tagged_sents(tag="sem")[0][1]

我收到以下错误：

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tree.py in _repr_png_(self)
    805                             env_vars=['PATH'],
--> 806                             verbose=False,
    807                         )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\internals.py in find_binary(name, path_to_bin, env_vars, searchpath, binary_names, url, verbose)
    696         find_binary_iter(
--> 697             name, path_to_bin, env_vars, searchpath, binary_names, url, verbose
    698         )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\internals.py in find_binary_iter(name, path_to_bin, env_vars, searchpath, binary_names, url, verbose)
    680     for file in find_file_iter(
--> 681         path_to_bin or name, env_vars, searchpath, binary_names, url, verbose
    682     ):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\internals.py in find_file_iter(filename, env_vars, searchpath, file_names, url, verbose, finding_dir)
    638         div = '=' * 75
--> 639         raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
    640 

LookupError: 

===========================================================================
NLTK was unable to find the gs file!
Use software specific configuration paramaters or set the PATH environment variable.
===========================================================================

During handling of the above exception, another exception occurred:

LookupError                               Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tree.py in _repr_png_(self)
    817                                         "https://docs.brew.sh/Installation then `brew install ghostscript`")                
    818                 print(pre_error_message, file=sys.stderr)
--> 819                 raise LookupError
    820 
    821             with open(out_path, 'rb') as sr:

LookupError: 

Tree(Lemma('group.n.01.group'), [Tree('NE', ['Fulton', 'County', 'Grand', 'Jury'])])

然而，输出仍然是：

Tree(Lemma('group.n.01.group'), [Tree('NE', ['Fulton', 'County', 'Grand', 'Jury'])])

这个 Lookuperror 是什么意思？并应采取行动？

Answer 1

My goal is to create a function that takes as input a sentence from SemCor and extracts a list which contains, for each token of the sentence, either the corresponding WordNet Lemma (e.g. Lemma('friday.n.01.Friday')) or None.

def lemma_list(sent):
    return [l.label() if isinstance(l, nltk.tree.Tree) else None for l in sent]

示例：

lemma_list(semcor.tagged_sents(tag="sem")[0])
#[None, 'group.n.01', 'say.v.01', 'friday.n.01', None, 'investigation.n.01', None, 'atlanta.n.01', None, 'recent.s.02', 'primary_election.n.01', 'produce.v.04', None, None, 'evidence.n.01', None, None, None, 'irregularity.n.01', 'take_place.v.01', None]

至于报错及其含义见NLTK was unable to find the gs file.

访问 semcor.tagged_sents() 中的树时出错

Error accessing tree in semcor.tagged_sents()

lookup

nltk

python-3.x