MySQL: where exists VS where id in [performance]

Question

这里也存在这个问题：Poor whereHas performance in Laravel ...但没有答案。

我遇到了与该问题的作者类似的情况：

replays table 有 4M 行
players table 有 40M 行

此查询使用 where exists 并且需要很长时间（70 秒）才能完成：

select * from `replays` 
where exists (
    select * from `players` 
    where `replays`.`id` = `players`.`replay_id` 
      and `battletag_name` = 'test') 
order by `id` asc 
limit 100;

但是当它改为使用 where id in 而不是 where exists - 它快得多 (0.4s):

select * from `replays` 
where id in (
    select replay_id from `players` 
    where `battletag_name` = 'test') 
order by `id` asc 
limit 100;

MySQL (InnoDB) 正在使用中。

我想了解为什么 where exists 与 where id in 之间的性能差异如此之大 - 是因为 MySQL 的工作方式吗？我预计 "exists" 变体会更快，因为 MySQL 只会检查相关行是否存在......但我错了（我可能不明白 "exists" 在这种情况下是如何工作的).

Answer 1

您应该显示执行计划。

要优化 exists，您需要 players(replay_id, battletag_name) 上的索引。 replays(id) 上的索引也应该有所帮助——但如果 id 是主键，则已经有一个索引。

Answer 2

戈登有一个很好的答案。事实上，性能取决于很多不同的因素，包括数据库 design/schema 和数据量。

作为粗略指南，exists sub-query 将对 replays 中的每一行执行一次，而 in sub-query 将执行执行一次以获得 sub-query 的结果，然后将在 replays 中的每一行中搜索这些结果。

所以对于 exists，indexing/access 路径越好，它运行就会越快。如果没有相关索引，它只会读取所有行，直到找到匹配项。对于 replays 中的每一行。对于没有匹配项的行，它每次都会读取整个 players table。即使有匹配项的行也可以在找到匹配项之前阅读大量 players。

in sub-query 的结果集越小，运行的速度就越快。对于那些没有匹配的，它只需要快速检查小的子查询行就可以得到那个答案。那就是说你没有得到索引的好处（如果它以这种方式工作）所以对于来自子查询的大结果集它必须在决定没有匹配之前读取子 select 中的每一行.

也就是说，数据库优化器非常聪明，并不总是按照您要求的方式评估查询，因此为什么检查执行计划和测试自己对于找出最佳方法很重要。期望某个执行路径只是为了发现优化器根据它期望的数据外观选择了不同的执行方法并不罕见。

MySQL: where exists VS where id in [performance]

MySQL: where exists VS where id in [performance]

mysql

sql

innodb

database-performance