为什么带有 nltk 的 Stanford 解析器不能正确解析一个句子?
Why Stanford parser with nltk is not correctly parsing a sentence?
我在 python 中将 Stanford 解析器与 nltk 一起使用,并得到 Stanford Parser and NLTK 的帮助来设置 Stanford nlp 库。
from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
print line
line.draw()
parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence
# GUI
for line in parsed_Sentence:
print line
line.draw()
我得到了错误的解析和依赖树,如下例所示,它将 'sees' 视为名词而不是动词。
我该怎么办?
当我更改句子时,它工作得很好,例如 (one = 'John see Bill')。
可以从这里查看这句话的正确输出 correct ouput of parse tree
正确输出的例子如下:
再次重申,没有完美的模型(参见);P
您可以尝试 "more accurate" 解析器,使用 NeuralDependencyParser
。
首先使用正确的环境变量正确设置解析器(参见 Stanford Parser and NLTK and https://gist.github.com/alvations/e1df0ba227e542955a8a),然后:
>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m'
>>> sent = "John sees Bill"
>>> [parse.tree() for parse in parser.raw_parse(sent)]
[Tree('sees', ['John', 'Bill'])]
请注意 NeuralDependencyParser
仅生成依赖树:
我在 python 中将 Stanford 解析器与 nltk 一起使用,并得到 Stanford Parser and NLTK 的帮助来设置 Stanford nlp 库。
from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
print line
line.draw()
parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence
# GUI
for line in parsed_Sentence:
print line
line.draw()
我得到了错误的解析和依赖树,如下例所示,它将 'sees' 视为名词而不是动词。
我该怎么办? 当我更改句子时,它工作得很好,例如 (one = 'John see Bill')。 可以从这里查看这句话的正确输出 correct ouput of parse tree
正确输出的例子如下:
再次重申,没有完美的模型(参见
您可以尝试 "more accurate" 解析器,使用 NeuralDependencyParser
。
首先使用正确的环境变量正确设置解析器(参见 Stanford Parser and NLTK and https://gist.github.com/alvations/e1df0ba227e542955a8a),然后:
>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m'
>>> sent = "John sees Bill"
>>> [parse.tree() for parse in parser.raw_parse(sent)]
[Tree('sees', ['John', 'Bill'])]
请注意 NeuralDependencyParser
仅生成依赖树: