JPA查询优化
JPA query optimization
我们有一个在 GlassFish 4.1 上运行的 JavaEE Web 应用程序。它在少量数据时表现良好,但现在数据变得越来越多。结果是像加载文档这样的简单请求需要大约 1 分钟才能加载,因为它几乎不必要地加载了整个数据库。
这些是实体:
文档实体:
@Entity
@JsonIdentityInfo(generator=JSOGGenerator.class)
@NamedEntityGraph(
name = "graph.Document.single",
attributeNodes = {
@NamedAttributeNode(value = "project", subgraph = "projectSubgraph")
},
subgraphs = {
@NamedSubgraph(
name = "projectSubgraph",
attributeNodes = {
@NamedAttributeNode("users")
}
)
}
)
public class Document extends BaseEntity {
@JsonView({ View.Documents.class, View.Projects.class })
@Column(name = "Name")
private String name;
@JsonView({ })
@JsonProperty(access = Access.WRITE_ONLY)
@Column(name = "Text", columnDefinition = "TEXT")
private String text;
@JsonView({ View.Documents.class })
@ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE },
optional = false)
@JoinColumn(name = "project_fk")
private Project project;
@JsonView({ View.Documents.class, View.Projects.class })
@OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
mappedBy = "document",
fetch = FetchType.EAGER)
private Set<State> states = new HashSet<>();
@JsonView({ })
@OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
fetch = FetchType.LAZY)
@JoinTable(
name="DOCUMENT_DEFAULTANNOTATIONS",
joinColumns={@JoinColumn(name="DOC_ID", referencedColumnName="id")},
inverseJoinColumns={@JoinColumn(name="DEFANNOTATION_ID", referencedColumnName="id")})
private Set<Annotation> defaultAnnotations = new HashSet<>();
...
}
项目实体:
@Entity
@JsonIdentityInfo(generator=JSOGGenerator.class)
public class Project extends BaseEntity {
@JsonView({ View.Projects.class })
@Column(name = "Name", unique = true)
private String name;
@JsonView({ View.Projects.class })
@OneToMany(mappedBy = "project",
cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
fetch = FetchType.EAGER)
private Set<Document> documents = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
@JoinTable(
name="PROJECTS_MANAGER",
joinColumns={@JoinColumn(name="PROJECT_ID", referencedColumnName="id")},
inverseJoinColumns={@JoinColumn(name="MANAGER_ID", referencedColumnName="id")})
private Set<Users> projectManager = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
@JoinTable(
name="PROJECTS_WATCHINGUSERS",
joinColumns={@JoinColumn(name="PROJECT_ID", referencedColumnName="id")},
inverseJoinColumns={@JoinColumn(name="WATCHINGUSER_ID", referencedColumnName="id")})
private Set<Users> watchingUsers = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToMany(mappedBy = "projects",
cascade = { CascadeType.PERSIST, CascadeType.MERGE },
fetch = FetchType.EAGER)
private Set<Users> users = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE },
fetch = FetchType.EAGER)
@JoinColumn(name="Scheme", nullable = false)
private Scheme scheme;
...
}
数据模型非常复杂并且具有部分循环结构。
对应DocumentDAO:
@Stateless
@TransactionAttribute(TransactionAttributeType.MANDATORY)
public class DocumentDAO extends BaseEntityDAO<Document> {
public DocumentDAO() {
super(Document.class);
}
public Document getDocumentById(Long docId) {
EntityGraph graph = this.em.getEntityGraph("graph.Document.single");
TypedQuery query = em.createQuery("SELECT d.id AS id, d.name AS name, d.project AS project " +
"FROM Document d " +
"JOIN FETCH d.project " +
"WHERE d.id = :id ", Document.class);
query.setParameter("id", docId);
//query.setHint("javax.persistence.loadgraph", graph);
//query.setHint("javax.persistence.fetchgraph", graph); //evokes an exception
Object[] result = (Object[]) query.getSingleResult();
Document doc = new Document();
doc.setId((Long) result[0]);
doc.setName((String) result[1]);
doc.setProject((Project) result[2]);
return doc;
}
}
在简单的 em.find(Document.class, docId) 之前也执行缓慢。所以下一个尝试是创建一个 NamedEntityGraph 来覆盖抓取策略。将图形作为提示传递 (em.find(Document.class, docId, hints)) 没有任何改变。与在 DocumentDAO 中编写 JPQL 查询的行为相同。将 NamedEntityGraph 分配为提示只会引发 "org.eclipse.persistence.exceptions.QueryException.fetchGroupNotSupportOnReportQuery: Fetch group cannot be set on report query"。我启用了 EclipseLink 日志记录,我可以看到该请求引发了大量不必要的 SQL 查询。
目的只是return一个Document对象,包含id、name和对应的project对象。项目对象应该只包含用户。我也想知道为什么 NamedEntityGraph 没有改变任何东西或者我没有正确使用它?
我们使用 EclipseLink 2.6.2 和 PostgreSQL。
更新:
日志中的片段:
[2016-06-05T17:50:27.875+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827875] [levelValue: 500] [[
SELECT t1.ID, t1.EndS, t1.NotSure, t1.StartS, t1.Text, t1.document_fk, t1.targetType_fk, t1.user_fk FROM DOCUMENT_DEFAULTANNOTATIONS t0, ANNOTATION t1 WHERE ((t0.DOC_ID = ?) AND (t1.ID = t0.DEFANNOTATION_ID))
bind => [38]]]
[2016-06-05T17:50:27.877+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827877] [levelValue: 500] [[
SELECT t1.ID, t1.EndS, t1.NotSure, t1.StartS, t1.Text, t1.document_fk, t1.targetType_fk, t1.user_fk FROM DOCUMENT_DEFAULTANNOTATIONS t0, ANNOTATION t1 WHERE ((t0.DOC_ID = ?) AND (t1.ID = t0.DEFANNOTATION_ID))
bind => [39]]]
...
[2016-06-05T17:50:27.771+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827771] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [53649]]]
[2016-06-05T17:50:27.773+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827773] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [53650]]]
...
[2016-06-05T17:56:50.881+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=30 _ThreadName=http-listener-1(2)] [timeMillis: 1465142210881] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [44220]]]
[2016-06-05T17:56:50.886+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=30 _ThreadName=http-listener-1(2)] [timeMillis: 1465142210886] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [44221]]]
...
总查询量约100.000。日志记录引用了此请求不需要的其他一些实体。最终结果应该是 500kb 而不是 7.1mb。
Chrome 控制台:
我不知道你的数据,但我认为这是正在发生的事情 -
你有以下热切的联想
document -> project (manyToOne is eager by default)
document -> states
project -> documents
project -> users
user -> ... (this is not shown in question, but there could be other eager associations)
加载文档及其对应的项目后 -
- 已获取所有项目文件
- 获取所有项目用户
- 获取所有文档状态
- 对于第 3 步中的每个文档,获取文档状态
- 对于步骤 2 中的每个用户,加载所有热切关联
你知道我要去哪里了。我认为这是 (n+1) 问题和过度使用预加载的组合,即使您不需要它也是如此。
我会说 'Eager' 获取策略对于复杂的对象图来说并不理想。我会在 JPQL 中使用 'join fetch' 语句将大部分关联作为延迟加载对象图。
我们有一个在 GlassFish 4.1 上运行的 JavaEE Web 应用程序。它在少量数据时表现良好,但现在数据变得越来越多。结果是像加载文档这样的简单请求需要大约 1 分钟才能加载,因为它几乎不必要地加载了整个数据库。
这些是实体:
文档实体:
@Entity
@JsonIdentityInfo(generator=JSOGGenerator.class)
@NamedEntityGraph(
name = "graph.Document.single",
attributeNodes = {
@NamedAttributeNode(value = "project", subgraph = "projectSubgraph")
},
subgraphs = {
@NamedSubgraph(
name = "projectSubgraph",
attributeNodes = {
@NamedAttributeNode("users")
}
)
}
)
public class Document extends BaseEntity {
@JsonView({ View.Documents.class, View.Projects.class })
@Column(name = "Name")
private String name;
@JsonView({ })
@JsonProperty(access = Access.WRITE_ONLY)
@Column(name = "Text", columnDefinition = "TEXT")
private String text;
@JsonView({ View.Documents.class })
@ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE },
optional = false)
@JoinColumn(name = "project_fk")
private Project project;
@JsonView({ View.Documents.class, View.Projects.class })
@OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
mappedBy = "document",
fetch = FetchType.EAGER)
private Set<State> states = new HashSet<>();
@JsonView({ })
@OneToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
fetch = FetchType.LAZY)
@JoinTable(
name="DOCUMENT_DEFAULTANNOTATIONS",
joinColumns={@JoinColumn(name="DOC_ID", referencedColumnName="id")},
inverseJoinColumns={@JoinColumn(name="DEFANNOTATION_ID", referencedColumnName="id")})
private Set<Annotation> defaultAnnotations = new HashSet<>();
...
}
项目实体:
@Entity
@JsonIdentityInfo(generator=JSOGGenerator.class)
public class Project extends BaseEntity {
@JsonView({ View.Projects.class })
@Column(name = "Name", unique = true)
private String name;
@JsonView({ View.Projects.class })
@OneToMany(mappedBy = "project",
cascade = { CascadeType.PERSIST, CascadeType.MERGE, CascadeType.REMOVE },
fetch = FetchType.EAGER)
private Set<Document> documents = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
@JoinTable(
name="PROJECTS_MANAGER",
joinColumns={@JoinColumn(name="PROJECT_ID", referencedColumnName="id")},
inverseJoinColumns={@JoinColumn(name="MANAGER_ID", referencedColumnName="id")})
private Set<Users> projectManager = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToMany(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
@JoinTable(
name="PROJECTS_WATCHINGUSERS",
joinColumns={@JoinColumn(name="PROJECT_ID", referencedColumnName="id")},
inverseJoinColumns={@JoinColumn(name="WATCHINGUSER_ID", referencedColumnName="id")})
private Set<Users> watchingUsers = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToMany(mappedBy = "projects",
cascade = { CascadeType.PERSIST, CascadeType.MERGE },
fetch = FetchType.EAGER)
private Set<Users> users = new HashSet<>();
@JsonView({ View.Projects.class })
@ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE },
fetch = FetchType.EAGER)
@JoinColumn(name="Scheme", nullable = false)
private Scheme scheme;
...
}
数据模型非常复杂并且具有部分循环结构。
对应DocumentDAO:
@Stateless
@TransactionAttribute(TransactionAttributeType.MANDATORY)
public class DocumentDAO extends BaseEntityDAO<Document> {
public DocumentDAO() {
super(Document.class);
}
public Document getDocumentById(Long docId) {
EntityGraph graph = this.em.getEntityGraph("graph.Document.single");
TypedQuery query = em.createQuery("SELECT d.id AS id, d.name AS name, d.project AS project " +
"FROM Document d " +
"JOIN FETCH d.project " +
"WHERE d.id = :id ", Document.class);
query.setParameter("id", docId);
//query.setHint("javax.persistence.loadgraph", graph);
//query.setHint("javax.persistence.fetchgraph", graph); //evokes an exception
Object[] result = (Object[]) query.getSingleResult();
Document doc = new Document();
doc.setId((Long) result[0]);
doc.setName((String) result[1]);
doc.setProject((Project) result[2]);
return doc;
}
}
在简单的 em.find(Document.class, docId) 之前也执行缓慢。所以下一个尝试是创建一个 NamedEntityGraph 来覆盖抓取策略。将图形作为提示传递 (em.find(Document.class, docId, hints)) 没有任何改变。与在 DocumentDAO 中编写 JPQL 查询的行为相同。将 NamedEntityGraph 分配为提示只会引发 "org.eclipse.persistence.exceptions.QueryException.fetchGroupNotSupportOnReportQuery: Fetch group cannot be set on report query"。我启用了 EclipseLink 日志记录,我可以看到该请求引发了大量不必要的 SQL 查询。
目的只是return一个Document对象,包含id、name和对应的project对象。项目对象应该只包含用户。我也想知道为什么 NamedEntityGraph 没有改变任何东西或者我没有正确使用它?
我们使用 EclipseLink 2.6.2 和 PostgreSQL。
更新:
日志中的片段:
[2016-06-05T17:50:27.875+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827875] [levelValue: 500] [[
SELECT t1.ID, t1.EndS, t1.NotSure, t1.StartS, t1.Text, t1.document_fk, t1.targetType_fk, t1.user_fk FROM DOCUMENT_DEFAULTANNOTATIONS t0, ANNOTATION t1 WHERE ((t0.DOC_ID = ?) AND (t1.ID = t0.DEFANNOTATION_ID))
bind => [38]]]
[2016-06-05T17:50:27.877+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827877] [levelValue: 500] [[
SELECT t1.ID, t1.EndS, t1.NotSure, t1.StartS, t1.Text, t1.document_fk, t1.targetType_fk, t1.user_fk FROM DOCUMENT_DEFAULTANNOTATIONS t0, ANNOTATION t1 WHERE ((t0.DOC_ID = ?) AND (t1.ID = t0.DEFANNOTATION_ID))
bind => [39]]]
...
[2016-06-05T17:50:27.771+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827771] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [53649]]]
[2016-06-05T17:50:27.773+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=31 _ThreadName=http-listener-1(3)] [timeMillis: 1465141827773] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [53650]]]
...
[2016-06-05T17:56:50.881+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=30 _ThreadName=http-listener-1(2)] [timeMillis: 1465142210881] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [44220]]]
[2016-06-05T17:56:50.886+0200] [glassfish 4.1] [FINE] [] [org.eclipse.persistence.session./file:/Users/timtoheus/NetBeansProjects/discanno/target/discanno-1.0/WEB-INF/classes/_DiscAnnoPU.sql] [tid: _ThreadID=30 _ThreadName=http-listener-1(2)] [timeMillis: 1465142210886] [levelValue: 500] [[
SELECT t1.ID, t1.LABEL_LabelId FROM ANNOTATION_LABELMAP t0, LABELLABELSETMAP t1 WHERE ((t0.ANNOTATION_ID = ?) AND (t1.ID = t0.MAP_ID))
bind => [44221]]]
...
总查询量约100.000。日志记录引用了此请求不需要的其他一些实体。最终结果应该是 500kb 而不是 7.1mb。
Chrome 控制台:
我不知道你的数据,但我认为这是正在发生的事情 -
你有以下热切的联想
document -> project (manyToOne is eager by default)
document -> states
project -> documents
project -> users
user -> ... (this is not shown in question, but there could be other eager associations)
加载文档及其对应的项目后 -
- 已获取所有项目文件
- 获取所有项目用户
- 获取所有文档状态
- 对于第 3 步中的每个文档,获取文档状态
- 对于步骤 2 中的每个用户,加载所有热切关联
你知道我要去哪里了。我认为这是 (n+1) 问题和过度使用预加载的组合,即使您不需要它也是如此。
我会说 'Eager' 获取策略对于复杂的对象图来说并不理想。我会在 JPQL 中使用 'join fetch' 语句将大部分关联作为延迟加载对象图。