Java CoreNLP 中缺少 StanfordNLP 通用依赖特性

Question

使用最新的 CoreNLP 3.9.2 Java API，我希望提取新的 Universal Dependencies 功能，因为它们出现在 StanfordNLP Python library, and as defined here - universaldependencies.org/guidelines.html 中。具体来说：

多词标记
通用依赖格式 (UPOS) 中的 POS 标签
UD 格式的语法依赖（使用 UPOS 标签）

当前的 CoreNLP 分别按照 here and here 所述生成 Penn Tree POS 标签和依赖项。

管道配置：

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
    props.setProperty("coref.algorithm", "neural");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    CoreDocument document = new CoreDocument(text);
    pipeline.annotate(document);

    CoreSentence sentence = document.sentences().get(0);
    sentence.posTags() // get pos tags
    sentence.dependencyParse() // dependency graph

非常感谢任何帮助和澄清我的误解。

Answer 1

法语、德语和西班牙语的 GitHub 版本代码和模型在 CoNLL 2018 UD 数据上进行了训练，并支持多词标记。

我们可能会也可能不会训练英语 UD 词性模型。

我相信选区分析器数据使用的是英语特定的词性标签。

这些更改将放入 4.0.0 版本中，有望在年底前完成。

Java CoreNLP 中缺少 StanfordNLP 通用依赖特性

Missing StanfordNLP Universal Dependency features in Java CoreNLP

stanford-nlp