JPA查询优化

JPA query optimization

我们有一个在 GlassFish 4.1 上运行的 JavaEE Web 应用程序。它在少量数据时表现良好,但现在数据变得越来越多。结果是像加载文档这样的简单请求需要大约 1 分钟才能加载,因为它几乎不必要地加载了整个数据库。

这些是实体:

文档实体:

@Entity
@JsonIdentityInfo(generator=JSOGGenerator.class)
@NamedEntityGraph(
    name = "graph.Document.single",
    attributeNodes = {
        @NamedAttributeNode(value = "project", subgraph = "projectSubgraph")
    },
    subgraphs = {
        @NamedSubgraph(
            name = "projectSubgraph",
            attributeNodes = {
                @NamedAttributeNode("users")
            }
        )
    }
)
public class Document extends BaseEntity {

    @JsonView({ View.Documents.class, View.Projects.class })
    @Column(name = "Name")
    private String name;

    @JsonView({ })
    @JsonProperty(access = Access.WRITE_ONLY)
    @Column(name = "Text", columnDefinition = "TEXT")
    private String text;

    @JsonView({ View.Documents.class })
    @ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE },
                optional = false)
    @JoinColumn(name = "project_fk")
    private Project project;

    @JsonView({ View.Documents.class, View.Projects.class })
    @OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
                mappedBy = "document",
                fetch = FetchType.EAGER)
    private Set<State> states = new HashSet<>();

    @JsonView({ })
    @OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
                fetch = FetchType.LAZY)
    @JoinTable(
        name="DOCUMENT_DEFAULTANNOTATIONS",
        joinColumns={@JoinColumn(name="DOC_ID", referencedColumnName="id")},
        inverseJoinColumns={@JoinColumn(name="DEFANNOTATION_ID", referencedColumnName="id")})
    private Set<Annotation> defaultAnnotations = new HashSet<>();

    ...
}

项目实体:

@Entity
@JsonIdentityInfo(generator=JSOGGenerator.class)
public class Project extends BaseEntity {

    @JsonView({ View.Projects.class })
    @Column(name = "Name", unique = true)
    private String name;

    @JsonView({ View.Projects.class })
    @OneToMany(mappedBy = "project",
                cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
                fetch = FetchType.EAGER)
    private Set<Document> documents = new HashSet<>();

    @JsonView({ View.Projects.class })
    @ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
    @JoinTable(
        name="PROJECTS_MANAGER",
        joinColumns={@JoinColumn(name="PROJECT_ID", referencedColumnName="id")},
        inverseJoinColumns={@JoinColumn(name="MANAGER_ID", referencedColumnName="id")})
    private Set<Users> projectManager = new HashSet<>();

    @JsonView({ View.Projects.class })
    @ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
    @JoinTable(
        name="PROJECTS_WATCHINGUSERS",
        joinColumns={@JoinColumn(name="PROJECT_ID", referencedColumnName="id")},
        inverseJoinColumns={@JoinColumn(name="WATCHINGUSER_ID", referencedColumnName="id")})
    private Set<Users> watchingUsers = new HashSet<>();

    @JsonView({ View.Projects.class })
    @ManyToMany(mappedBy = "projects",
                cascade = { CascadeType.PERSIST, CascadeType.MERGE },
                fetch = FetchType.EAGER)
    private Set<Users> users = new HashSet<>();

    @JsonView({ View.Projects.class })
    @ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE },
                fetch = FetchType.EAGER)
    @JoinColumn(name="Scheme", nullable = false)
    private Scheme scheme;

    ...
}

数据模型非常复杂并且具有部分循环结构。

对应DocumentDAO:

@Stateless
@TransactionAttribute(TransactionAttributeType.MANDATORY)
public class DocumentDAO extends BaseEntityDAO<Document> {

    public DocumentDAO() {
        super(Document.class);
    }

    public Document getDocumentById(Long docId) {

        EntityGraph graph = this.em.getEntityGraph("graph.Document.single");

        TypedQuery query = em.createQuery("SELECT d.id AS id, d.name AS name, d.project AS project " +
                                    "FROM Document d " +
                                    "JOIN FETCH d.project " +
                                    "WHERE d.id = :id ", Document.class);
        query.setParameter("id", docId);
        //query.setHint("javax.persistence.loadgraph", graph);
        //query.setHint("javax.persistence.fetchgraph", graph); //evokes an exception
        Object[] result  = (Object[]) query.getSingleResult();

        Document doc = new Document();
        doc.setId((Long) result[0]);
        doc.setName((String) result[1]);
        doc.setProject((Project) result[2]);

        return doc;
    }

}

在简单的 em.find(Document.class, docId) 之前也执行缓慢。所以下一个尝试是创建一个 NamedEntityGraph 来覆盖抓取策略。将图形作为提示传递 (em.find(Document.class, docId, hints)) 没有任何改变。与在 DocumentDAO 中编写 JPQL 查询的行为相同。将 NamedEntityGraph 分配为提示只会引发 "org.eclipse.persistence.exceptions.QueryException.fetchGroupNotSupportOnReportQuery: Fetch group cannot be set on report query"。我启用了 EclipseLink 日志记录,我可以看到该请求引发了大量不必要的 SQL 查询。

目的只是return一个Document对象,包含id、name和对应的project对象。项目对象应该只包含用户。我也想知道为什么 NamedEntityGraph 没有改变任何东西或者我没有正确使用它?

我们使用 EclipseLink 2.6.2 和 PostgreSQL。

更新:

日志中的片段:

[2016-06-05T17:50:27.875+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827875] [levelValue: 500] [[
  SELECT t1.ID, t1.EndS, t1.NotSure, t1.StartS, t1.Text, t1.document_fk, t1.targetType_fk, t1.user_fk FROM DOCUMENT_DEFAULTANNOTATIONS t0, ANNOTATION t1 WHERE ((t0.DOC_ID = ?) AND (t1.ID = t0.DEFANNOTATION_ID))
    bind => [38]]]

[2016-06-05T17:50:27.877+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827877] [levelValue: 500] [[
  SELECT t1.ID, t1.EndS, t1.NotSure, t1.StartS, t1.Text, t1.document_fk, t1.targetType_fk, t1.user_fk FROM DOCUMENT_DEFAULTANNOTATIONS t0, ANNOTATION t1 WHERE ((t0.DOC_ID = ?) AND (t1.ID = t0.DEFANNOTATION_ID))
    bind => [39]]]

... 

[2016-06-05T17:50:27.771+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827771] [levelValue: 500] [[
  SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
    bind => [53649]]]

[2016-06-05T17:50:27.773+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827773] [levelValue: 500] [[
  SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
    bind => [53650]]]

...

[2016-06-05T17:56:50.881+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=30 _ThreadName=http-listener-1(2)] [timeMillis: 1465142210881] [levelValue: 500] [[
  SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
    bind => [44220]]]

[2016-06-05T17:56:50.886+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=30 _ThreadName=http-listener-1(2)] [timeMillis: 1465142210886] [levelValue: 500] [[
  SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
    bind => [44221]]]

...

总查询量约100.000。日志记录引用了此请求不需要的其他一些实体。最终结果应该是 500kb 而不是 7.1mb。

Chrome 控制台:

我不知道你的数据,但我认为这是正在发生的事情 -

你有以下热切的联想

document -> project  (manyToOne is eager by default)
document -> states
project -> documents
project -> users 
user -> ... (this is not shown in question, but there could be other eager associations)

加载文档及其对应的项目后 -

  1. 已获取所有项目文件
  2. 获取所有项目用户
  3. 获取所有文档状态
  4. 对于第 3 步中的每个文档,获取文档状态
  5. 对于步骤 2 中的每个用户,加载所有热切关联

你知道我要去哪里了。我认为这是 (n+1) 问题和过度使用预加载的组合,即使您不需要它也是如此。

我会说 'Eager' 获取策略对于复杂的对象图来说并不理想。我会在 JPQL 中使用 'join fetch' 语句将大部分关联作为延迟加载对象图。