来自 Stanford 的 R coreNLP 包的 initCoreNLP() 方法调用抛出错误

Question

我正在尝试使用 coreNLP 包。我运行以下命令遇到 GC overhead limit exceeded 错误。

library(rJava)

downloadCoreNLP()

initCoreNLP()

错误是这样的：

Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... Error in rJava::.jnew("edu.stanford.nlp.pipeline.StanfordCoreNLP", basename(path)) : java.lang.OutOfMemoryError: GC overhead limit exceeded Error during wrapup: cannot open the connection

我不太了解Java，有人可以帮我吗？

Answer 1

尝试了以下方法，但没有成功 -

options(java.parameters = "-Xmx1000m") - 增加堆大小
gc() - 这将导致垃圾收集自动发生

重启机器后终于自行解决！

Answer 2

@indi 我运行遇到了同样的问题（参见），但能够提出比简单重启更可重复的解决方案。

init 命令的完整语法是

initCoreNLP(libLoc, parameterFile, mem = "4g", annotators)

增加 mem 对我没有帮助，但我意识到你和我都被 ner 注释器（命名实体识别）中的一个分类器困住了。由于我所需要的只是词性标注，因此我将 init 命令替换为以下内容：

initCoreNLP(mem = "8g", annotators = c("tokenize", "ssplit", "pos"))

这导致 init 命令在闪存中执行并且没有内存问题。顺便说一句，我将 mem 增加到 8g 只是因为我有那么多 RAM。我确定我可以将它保留为默认的 4g，这样就没问题了。

我不知道你是否需要 ner 注释器。如果不是，则显式列出 annotators 参数。以下是可能值的列表：http://stanfordnlp.github.io/CoreNLP/annotators.html。只需选择完成工作绝对需要的那些。如果您确实需要 ner，那么再次找出您需要的最小注释集并指定它们。

所以你（以及希望其他人）去吧！

Answer 3

我找到了一个更通用的解决方案：增加 rJava 的堆 space，如所述 here:

原因：依赖于 rJava 的库的默认堆大小为 512MB。超过这个最大尺寸相对容易。

解决方案：在 rJava 的选项支持中增加 JVM 堆大小：

options(java.parameters = "-Xmx4096m")

请注意，必须在加载任何包之前执行此步骤。

那我运行:

initCoreNLP(mem = "4g")

...整个 CoreNLP 已加载并且运行成功。

来自 Stanford 的 R coreNLP 包的 initCoreNLP() 方法调用抛出错误

initCoreNLP() method call from the Stanford's R coreNLP package throws error

r

stanford-nlp