使用 IN 或子查询到另一个 table 的 where 查询速度更快吗？

Question

假设您有一个 table "users"，其中包含 100,000 条记录，您需要按 id 查找 3000 项。

通过

进行查询会不会更快

Select * from users where id IN (2,5,30,89,...) # 3000 items

或者将这 3000 个项目存储在另一个 table 中并执行子查询会更快，例如：

Select * from users where id IN (select distinct id from lookuptable)
# lookuptable contains the 3000 records

或者这完全一样？谢谢！

Answer 1

找出答案的最佳方法是对工作数据集使用解释分析。 sql explain 它将向您显示查询执行时间和查询路径。

查询优化器可能会根据 table 大小、数据库设置、内存设置等使用不同的技术

如果查找 table 只有 3000 条记录你不需要 distinct ，如果它真的很大并且有更多的记录并且 distinct 创建 3000 条唯一记录那么第一个解决方案可能会更快。

Answer 2

在 PostgreSQL 中，最快的方法是创建一个查找 table 并像这样查询：

SELECT * FROM users AS u
WHERE EXISTS (SELECT 1 FROM lookuptable AS l
              WHERE u.id = l.id);

Answer 3

我已经根据要求创建了一个数据库并对其进行了测试。从 "timing" 的角度来看确实没有区别，但可能是因为我的测试沙箱环境。

无论如何，我 "explained" 这些树查询：

1- select * from users where id in (1,2,3,4,5,6,7,8,9,10,..3000)

成本："Index Scan using users_pkey on users (cost=4.04..1274.75 rows=3000 width=11)"" 索引条件：(id = ANY ('{1,2,3,4,5,6,7,8,9,10 (...)"

2- SELECT * FROM users AS u WHERE EXISTS (SELECT 1 FROM lookuptable A-- l WHERE u.id = l.id); <- 注意我把'distinct'去掉了，没用了

费用："Merge Semi Join (cost=103.22..364.35 rows=3000 width=11)"

" 合并条件：(u.id = l.id)"

" -> 使用 users_pkey 对用户进行索引扫描 (cost=0.29..952.68 rows=30026 width=11)"

3- Select * from users where id IN (select id from lookuptable)

"Merge Semi Join (cost=103.22..364.35 rows=3000 width=11)"

" 合并条件：(users.id = lookuptable.id)"

" -> 使用 users_pkey 对用户进行索引扫描 (cost=0.29..952.68 rows=30026 width=11)"

" -> 在查找表上使用 lookuptable_pkey 仅索引扫描 (cost=0.28..121.28 rows=3000 width=4)"

最后两个查询的解释图：

无论如何，正如我从上面的一些评论中读到的，您还必须将填充查找表的成本添加到查询成本中。以及您必须将 "querying" 拆分为不同的执行，这可能会导致 "transactional problems"。我将使用第一个查询。

Is a where query faster with IN or a subquery to another table?