如何使用 Stanford CoreNLP 从 TreeAnnotation 中提取 unlabelled/untyped 依赖树？

Question

目标语言是西班牙语。

英语管道支持类型化依赖项，而据我所知，西班牙语管道不支持。

目标是从 TreeAnnotation 生成依赖树，其中最终结果是有向边的列表。使用 CoreNLP 3.4.1 并使用西班牙模型是否可行，如果可以：如何实现？

背景

我正在使用 Stanford CoreNLP 3.4.1 +（用于 POS 标记的 3.5.0 西班牙语模型）（由于兼容性原因，Java 8 还不能使用）具有以下配置：

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, ner, parse");
props.setProperty("tokenize.options", "invertible=true,ptb3Escaping=true");
props.setProperty("tokenize.language", "es");

props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger");
props.setProperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");

props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/spanishSR.ser.gz"); //Stanford Parser 3.4.1 shift-reduce models for Spanish. 

props.setProperty("ner.applyNumericClassifiers", "false");
props.setProperty("ner.useSUTime", "false");

然后用于创建文档的管道和运行注释。

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);

List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

for(CoreMap sentence: sentences) {

    // ... extract start, end position of sentence ...

    for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {

        // ... extract POS tags, NER annotations, id ...
    }

    //This works, and I have a tree that is not empty.
    Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
}

通过使用调试器，我能够检查句子和标记并得出结论，它们具有以下内容：

句子（键）

来自edu.stanford.nlp.ling.CoreAnnotations：

文本注释
CharacterOffsetBeginAnnotation
CharacterOffsetEndAnnotation
TokensAnnotation
TokenBeginAnnotation
TokenEndAnnotation
SentenceIndexAnnotation

来自 edu.stanford.nlp.trees.TreeCoreAnnotations

树注释

代币（钥匙）

来自edu.stanford.nlp.ling.CoreAnnotations

文本注释
原始文本注释
CharacterOffsetBeginAnnotation
CharacterOffsetEndAnnotation
注释前
注释后
索引注释
SentenceIndexAnnotation
PartOfSpeechAnnotation
NamedEntityTagAnnotation

来自 edu.stanford.nlp.trees.TreeCoreAnnotations

HeadWordAnnotation - 在我的实验中：这个始终指向自身，即从中检索注释的标记。
HeadTagAnnotation

提前致谢！

Answer 1

目前 CoreNLP 不支持西班牙语依赖解析。这包括来自选区分析的类型依赖转换。

实现了寻头器（但未完全测试）。您可以使用这个 head finder 破解一个无类型的依赖转换器，但我们不能保证这会产生一个合理的解析。

如何使用 Stanford CoreNLP 从 TreeAnnotation 中提取 unlabelled/untyped 依赖树？

How to extract an unlabelled/untyped dependency tree from a TreeAnnotation using Stanford CoreNLP?

java

stanford-nlp

背景

句子（键）

代币（钥匙）