使用客户端重现 BioGrakn 文本挖掘示例时出现 OutOfMemoryError Java

Question

我正在尝试重现 White Paper "Text Mined Knowledge Graphs" with the aim of building a text mined knowledge graph out of my (non-biomedical) document collection later on. Therefore, I buildt a Maven project out of the classes and the data from the textmining use case in the biograkn repo 中的 BioGrakn 示例。我的 pom.xml 看起来像这样：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>TextMining-BioGrakn</groupId>
  <artifactId>TextMining-BioGrakn</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>TextMining-BioGrakn</name>
  <repositories>
    <repository>
        <id>repo.grakn.ai</id>
        <url>https://repo.grakn.ai/repository/maven/</url>
    </repository>
</repositories>
    <dependencies>
        <dependency>
        <groupId>io.grakn.client</groupId>
        <artifactId>api</artifactId>
        <version>1.5.2</version>
    </dependency>
    <dependency>
        <groupId>io.grakn.core</groupId>
        <artifactId>concept</artifactId>
        <version>1.5.3</version>
    </dependency>
    <dependency>
        <groupId>io.graql</groupId>
        <artifactId>lang</artifactId>
        <version>1.0.1</version>
    </dependency>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.2</version>
        </dependency>
        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.9.2</version>
            <classifier>models</classifier>
        </dependency>
    </dependencies>
</project>

迁移模式、插入已发布的文章并训练模型非常有效，但后来我得到了一个 java.lang.OutOfMemoryError: GC overhead limit exceeded，它被抛入 CoreNLP mineText() 方法中 class . Migrator class 中的主要方法如下所示：

public class Migrator {

    public static void main(String[] args) {

        GraknClient graknClient = new GraknClient("localhost:48555");

        GraknClient.Session session = graknClient.session("text_mining");

        try {
            loadSchema("schema/text-mining-schema.gql", session);
            PubmedArticle.migrate(session);
            CoreNLP.migrate(session);
        } catch (Exception e) {
            e.printStackTrace();
            session.close();
        }

        session.close();
        graknClient.close();
    }
}

您是否知道可能导致此错误的原因？我在这里错过了一些基本的东西吗？非常感谢任何帮助。

Answer 1

您可能需要 allocate more memory for your program。

如果存在导致此问题的错误，则 capture a heap dump (hprof) using the HeapDumpOnOutOfMemoryError flag. (Make sure you put the command line flags in the right order: Generate java dump when OutOfMemory)

获得 hprof 后，您可以使用 Eclipse Memory Analyzer Tool 对其进行分析它有一个非常好的 "Leak Suspects Report" 你可以在启动时运行这将帮助你了解是什么导致了过多的内存使用。在任何看起来像泄漏的非常大的对象上使用 'Path to GC root' 来查看是什么让它们在堆上保持活动状态。

如果您需要关于泄漏原因的第二意见，请查看 IBM Heap Analyzer Tool，它也非常有效。

祝你好运！

使用客户端重现 BioGrakn 文本挖掘示例时出现 OutOfMemoryError Java

OutOfMemoryError while reproducing BioGrakn Text Mining example with client Java

java

heap-memory

stanford-nlp

vaticle-typedb