带有 CollapsedCCProcessedDependenciesAnnotation 的 CoreNLP ConLL 格式
CoreNLP ConLL format with CollapsedCCProcessedDependenciesAnnotation
我正在使用最新版本的 CoreNLP。
我的任务是解析文本并使用 CollapsedCCProcessedDependenciesAnnotation 获得 conll 格式的输出。
我运行以下命令
time java -cp $CoreNLP/javanlp-core.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props $CoreNLP/config.properties -file 12309959 -outputFormat conll
depparse.model = english_SD.gz
问题是如何得到CollapsedCCProcessedDependenciesAnnotation
.
我试过用
depparse.extradependencies 在 config.properties
但是CCProcessedDependenciesAnnotation
没有参数根据
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/GrammaticalStructure.Extras.html#REF_ONLY_COLLAPSED
你能提出任何解决方案,我可以如何在 conll 中解析 CollapsedCCProcessedDependenciesAnnotation
?
您可以通过编程方式检索 CC 处理的依赖项。
This question 应该是一个很好的例子(请参阅示例中使用 CollapsedCCProcessedDependenciesAnnotation
的代码)。
Gabor 在邮件列表中的回答很好地解释了这种行为(即,为什么你不能直接输出折叠的依赖项):
Note that in general the collapsed cc processed dependencies won't output losslessly to conll though, as the format expects a tree (every word has a unique parent), and the dependencies can have multiple heads.
The output formatter therefore uses the basic dependencies only: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/CoNLLOutputter.java#L118. This could be changed in the code without crashing anything, but the serialized trees would be missing some edges, and ties for which edges are included would be broken somewhat arbitrarily. You may be better off writing your own logic for dumping to conll to fit your particular use case (you can probably copy much of our conll outputter code from above).
我正在使用最新版本的 CoreNLP。
我的任务是解析文本并使用 CollapsedCCProcessedDependenciesAnnotation 获得 conll 格式的输出。
我运行以下命令
time java -cp $CoreNLP/javanlp-core.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props $CoreNLP/config.properties -file 12309959 -outputFormat conll
depparse.model = english_SD.gz
问题是如何得到CollapsedCCProcessedDependenciesAnnotation
.
我试过用 depparse.extradependencies 在 config.properties
但是CCProcessedDependenciesAnnotation
没有参数根据
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/GrammaticalStructure.Extras.html#REF_ONLY_COLLAPSED
你能提出任何解决方案,我可以如何在 conll 中解析 CollapsedCCProcessedDependenciesAnnotation
?
您可以通过编程方式检索 CC 处理的依赖项。
This question 应该是一个很好的例子(请参阅示例中使用 CollapsedCCProcessedDependenciesAnnotation
的代码)。
Gabor 在邮件列表中的回答很好地解释了这种行为(即,为什么你不能直接输出折叠的依赖项):
Note that in general the collapsed cc processed dependencies won't output losslessly to conll though, as the format expects a tree (every word has a unique parent), and the dependencies can have multiple heads.
The output formatter therefore uses the basic dependencies only: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/CoNLLOutputter.java#L118. This could be changed in the code without crashing anything, but the serialized trees would be missing some edges, and ties for which edges are included would be broken somewhat arbitrarily. You may be better off writing your own logic for dumping to conll to fit your particular use case (you can probably copy much of our conll outputter code from above).