斯坦福 LexParser 多线程

Question

我最近在使用 Stanford Lexparser。不幸的是，我遇到了一个问题，因为它需要很长时间，尤其是当我传入一个大文件时。多线程是否有助于提高性能？我知道多线程可以在命令行中轻松完成。但是，我想在内部使用 API 对其进行多线程处理。目前，我正在使用此代码。我如何让它成为多线程？

for (List<HasWord> sentence : new DocumentPreprocessor(fileReader)) {
        parse = lp.apply(sentence);
        TreePrint tp = new TreePrint("typedDependenciesCollapsed");
        tp.printTree(parse, pw);
}

Answer 1

您可以只使用常规的旧 Java 线程来并行注释文档。例如：

Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Annotation ann = new Annotation("your sentence here");
for (int i = 0; i < 100; ++i) {
  new Thread() {
    @Override public void run() {
      pipeline.annotate(ann);  // except, you should probably annotate different documents.
      Tree tree = ann.get(SentencesAnnotation.class).get(0).get(TreeAnnotation.class);
    }
  }.start();
}

另一种选择是使用 simple API:

for (int i = 0; i < 100; ++i) {
  new Thread() {
    @Override public void run() {
      Tree tree = new Sentence("your sentence").parse();
    }
  }.start();
}

虽然在较高级别上，您不太可能从多线程中获得惊人的巨大加速。解析通常很慢（O(n^3) wrt 句子长度），多线程只给你核心数量的最大线性加速。使事情变得更快的另一种方法是使用 the shift reduce parser, or, if you're ok with dependency and not constituency parses, the Stanford Neural Dependency Parser.

斯坦福 LexParser 多线程

Stanford LexParser Multithreading

java

multithreading

nlp

stanford-nlp