SQL 使用 OR 的查询比 2 个单独的查询慢得多
SQL query with OR much slower than 2 separate queries
当我解释以下查询时:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status IN (2,3))
OR NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid);
它产生:
Delete on audittaskimpl l (cost=8.61..424652.49 rows=38144 width=6)
-> Seq Scan on audittaskimpl l (cost=8.61..424652.49 rows=38144 width=6)
Filter: ((hashed SubPlan 1) OR (NOT (SubPlan 2)))
SubPlan 1
-> Index Scan using idx_pinstlog_status on processinstancelog spl (cost=0.29..8.61 rows=1 width=8)
Index Cond: (status = ANY ('{2,3}'::integer[]))
SubPlan 2
-> Index Only Scan using idx_pinstlog_pinstid on processinstancelog spl_1 (cost=0.29..8.31 rows=1 width=0)
Index Cond: (processinstanceid = l.processinstanceid)
大约 40 万次提取。但由于我使用了 OR,理论上我可以 运行 这两个查询分别进行,然后将它们合并。那么第一个:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status in (2,3))
产生:
Delete on audittaskimpl l (cost=8.62..2147.72 rows=1 width=12)
-> Hash Semi Join (cost=8.62..2147.72 rows=1 width=12)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=14)
-> Hash (cost=8.61..8.61 rows=1 width=14)
-> Index Scan using idx_pinstlog_status on processinstancelog spl (cost=0.29..8.61 rows=1 width=14)
Index Cond: (status = ANY ('{2,3}'::integer[]))
第二个:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid);
产生:
Delete on audittaskimpl l (cost=2666.49..5736.94 rows=1 width=12)
-> Hash Anti Join (cost=2666.49..5736.94 rows=1 width=12)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=14)
-> Hash (cost=1781.66..1781.66 rows=50866 width=14)
-> Seq Scan on processinstancelog spl (cost=0.00..1781.66 rows=50866 width=14)
所以总共 cca 8k 磁盘提取。
两个表都包含大约 50 000 行。数据库是 PostgreSQL 9.3。示例是使用 DML (DELETE FROM ...) 但使用 DQL (SELECT...) 它会产生相同的结果。
这里的另一个例子是 SELECT 使用 UNION ALL:
EXPLAIN SELECT l.id
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid)
UNION ALL
SELECT l.id
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status IN (2,3))
产生:
Append (cost=2616.49..7975.41 rows=2 width=8)
-> Hash Anti Join (cost=2616.49..5827.67 rows=1 width=8)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=16)
-> Hash (cost=1781.66..1781.66 rows=50866 width=8)
-> Seq Scan on processinstancelog spl (cost=0.00..1781.66 rows=50866 width=8)
-> Hash Semi Join (cost=8.62..2147.72 rows=1 width=8)
Hash Cond: (l_1.processinstanceid = spl_1.processinstanceid)
-> Seq Scan on audittaskimpl l_1 (cost=0.00..2005.59 rows=50859 width=16)
-> Hash (cost=8.61..8.61 rows=1 width=8)
-> Index Scan using idx_pinstlog_status on processinstancelog spl_1 (cost=0.29..8.61 rows=1 width=8)
Index Cond: (status = ANY ('{2,3}'::integer[]))
所以总共 cca 8k 提取。为什么使用 OR 的 SQL 查询比 2 个单独的查询慢得多?可能是优化器问题?
感谢回复!
一个就够了,为什么要浪费时间在两个查询上?
DELETE
FROM AuditTaskImpl l
WHERE not exists (
SELECT null FROM ProcessInstanceLog spl
WHERE spl.processInstanceId = l.processInstanceId
and spl.status not IN (2,3))
当我解释以下查询时:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status IN (2,3))
OR NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid);
它产生:
Delete on audittaskimpl l (cost=8.61..424652.49 rows=38144 width=6)
-> Seq Scan on audittaskimpl l (cost=8.61..424652.49 rows=38144 width=6)
Filter: ((hashed SubPlan 1) OR (NOT (SubPlan 2)))
SubPlan 1
-> Index Scan using idx_pinstlog_status on processinstancelog spl (cost=0.29..8.61 rows=1 width=8)
Index Cond: (status = ANY ('{2,3}'::integer[]))
SubPlan 2
-> Index Only Scan using idx_pinstlog_pinstid on processinstancelog spl_1 (cost=0.29..8.31 rows=1 width=0)
Index Cond: (processinstanceid = l.processinstanceid)
大约 40 万次提取。但由于我使用了 OR,理论上我可以 运行 这两个查询分别进行,然后将它们合并。那么第一个:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status in (2,3))
产生:
Delete on audittaskimpl l (cost=8.62..2147.72 rows=1 width=12)
-> Hash Semi Join (cost=8.62..2147.72 rows=1 width=12)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=14)
-> Hash (cost=8.61..8.61 rows=1 width=14)
-> Index Scan using idx_pinstlog_status on processinstancelog spl (cost=0.29..8.61 rows=1 width=14)
Index Cond: (status = ANY ('{2,3}'::integer[]))
第二个:
EXPLAIN DELETE
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid);
产生:
Delete on audittaskimpl l (cost=2666.49..5736.94 rows=1 width=12)
-> Hash Anti Join (cost=2666.49..5736.94 rows=1 width=12)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=14)
-> Hash (cost=1781.66..1781.66 rows=50866 width=14)
-> Seq Scan on processinstancelog spl (cost=0.00..1781.66 rows=50866 width=14)
所以总共 cca 8k 磁盘提取。 两个表都包含大约 50 000 行。数据库是 PostgreSQL 9.3。示例是使用 DML (DELETE FROM ...) 但使用 DQL (SELECT...) 它会产生相同的结果。
这里的另一个例子是 SELECT 使用 UNION ALL:
EXPLAIN SELECT l.id
FROM AuditTaskImpl l
WHERE NOT EXISTS (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE l.processinstanceid = spl.processinstanceid)
UNION ALL
SELECT l.id
FROM AuditTaskImpl l
WHERE l.processInstanceId IN (SELECT spl.processInstanceId
FROM ProcessInstanceLog spl
WHERE spl.status IN (2,3))
产生:
Append (cost=2616.49..7975.41 rows=2 width=8)
-> Hash Anti Join (cost=2616.49..5827.67 rows=1 width=8)
Hash Cond: (l.processinstanceid = spl.processinstanceid)
-> Seq Scan on audittaskimpl l (cost=0.00..2005.59 rows=50859 width=16)
-> Hash (cost=1781.66..1781.66 rows=50866 width=8)
-> Seq Scan on processinstancelog spl (cost=0.00..1781.66 rows=50866 width=8)
-> Hash Semi Join (cost=8.62..2147.72 rows=1 width=8)
Hash Cond: (l_1.processinstanceid = spl_1.processinstanceid)
-> Seq Scan on audittaskimpl l_1 (cost=0.00..2005.59 rows=50859 width=16)
-> Hash (cost=8.61..8.61 rows=1 width=8)
-> Index Scan using idx_pinstlog_status on processinstancelog spl_1 (cost=0.29..8.61 rows=1 width=8)
Index Cond: (status = ANY ('{2,3}'::integer[]))
所以总共 cca 8k 提取。为什么使用 OR 的 SQL 查询比 2 个单独的查询慢得多?可能是优化器问题?
感谢回复!
一个就够了,为什么要浪费时间在两个查询上?
DELETE
FROM AuditTaskImpl l
WHERE not exists (
SELECT null FROM ProcessInstanceLog spl
WHERE spl.processInstanceId = l.processInstanceId
and spl.status not IN (2,3))