优化 MySQL 个索引以在一秒内查询
Optimize MySQL indexes to query in less than a second
一个非常简单的 min-max no-join no-nesting SQL 的查询时间超过 2 秒。
TABLE 结构:::
> DESCRIBE tbl;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ... | ... | ... | ... | ... | ... |
| created_at | datetime | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
table 包含 10,000,000 多行
TABLE:::
中的索引
> SHOW INDEX IN tbl;
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tbl | 0 | PRIMARY | 1 | id | A | 10000545 | NULL | NULL | | BTREE | | |
| tbl | 1 | created_at | 1 | created_at | A | 18 | NULL | NULL | YES | BTREE | | |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
SQL 受关注::: 查找最后 10k 个条目的最小-最大日期时间
SELECT
min(created_at),
max(created_at)
FROM tbl
ORDER BY id DESC
LIMIT 10000
问题::: 第一个查询总是需要 2 秒以上才能完成。在第一个 select 之后,所有后续查询调用只需不到 0.001 秒即可完成,除非将新行插入 table。
2.06 秒后第一次调用:
> SELECT min(created_at), max(created_at) FROM tbl USE INDEX (created_at) ORDER BY id DESC LIMIT 10000;
+---------------------+---------------------+
| min(created_at) | max(created_at) |
+---------------------+---------------------+
| 2010-01-01 00:00:00 | 2015-12-28 00:00:00 |
+---------------------+---------------------+
1 row in set (2.06 sec)
0.00 秒后的后续调用:
> SELECT min(created_at), max(created_at) FROM tbl USE INDEX (created_at) ORDER BY id DESC LIMIT 10000;
+---------------------+---------------------+
| min(created_at) | max(created_at) |
+---------------------+---------------------+
| 2010-01-01 00:00:00 | 2015-12-28 00:00:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)
将新行添加到 table 后,它再次需要 2 秒以上才能完成,然后所有后续查询调用只需不到 0.001 秒即可完成。
我知道每次插入新行时索引都会重新排列。所以没关系。但是,我的目标是将第一次查询时间缩短到几毫秒以下,因为在频繁更新的系统中每次查询花费 2 秒以上会降低性能性能太重了。
查询计划的解释::: 解释语句显示查询几乎遍历了 table 的所有行。所以我猜我有通过索引改进的空间。但是我应该索引什么?
> EXPLAIN SELECT min(created_at), max(created_at) FROM tbl ORDER BY id DESC LIMIT 10000;
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| 1 | SIMPLE | tbl | index | NULL | created_at | 9 | NULL | 10000545 | Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
1 row in set (0.00 sec)
您的原始查询没有 return "the minimum/maximum date from the last 10000 entries"; LIMIT
在聚合函数处理后应用,所以你问的是 "Give me the max/min date, then limit that to the first 10k"... 而且只有一行。
您必须为此使用子查询:
SELECT min(created_at), max(created_at)
FROM (SELECT created_at
FROM my_table
ORDER BY id
LIMIT 10000) subtable;
你最好的选择是 (id, created_at)
上的索引,因为子查询只需要遍历索引,然后 min/max 查询只需要对超过 10k 个元素进行排序。
一个非常简单的 min-max no-join no-nesting SQL 的查询时间超过 2 秒。
TABLE 结构:::
> DESCRIBE tbl;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ... | ... | ... | ... | ... | ... |
| created_at | datetime | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
table 包含 10,000,000 多行
TABLE:::
中的索引> SHOW INDEX IN tbl;
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tbl | 0 | PRIMARY | 1 | id | A | 10000545 | NULL | NULL | | BTREE | | |
| tbl | 1 | created_at | 1 | created_at | A | 18 | NULL | NULL | YES | BTREE | | |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
SQL 受关注::: 查找最后 10k 个条目的最小-最大日期时间
SELECT
min(created_at),
max(created_at)
FROM tbl
ORDER BY id DESC
LIMIT 10000
问题::: 第一个查询总是需要 2 秒以上才能完成。在第一个 select 之后,所有后续查询调用只需不到 0.001 秒即可完成,除非将新行插入 table。
2.06 秒后第一次调用:
> SELECT min(created_at), max(created_at) FROM tbl USE INDEX (created_at) ORDER BY id DESC LIMIT 10000;
+---------------------+---------------------+
| min(created_at) | max(created_at) |
+---------------------+---------------------+
| 2010-01-01 00:00:00 | 2015-12-28 00:00:00 |
+---------------------+---------------------+
1 row in set (2.06 sec)
0.00 秒后的后续调用:
> SELECT min(created_at), max(created_at) FROM tbl USE INDEX (created_at) ORDER BY id DESC LIMIT 10000;
+---------------------+---------------------+
| min(created_at) | max(created_at) |
+---------------------+---------------------+
| 2010-01-01 00:00:00 | 2015-12-28 00:00:00 |
+---------------------+---------------------+
1 row in set (0.00 sec)
将新行添加到 table 后,它再次需要 2 秒以上才能完成,然后所有后续查询调用只需不到 0.001 秒即可完成。
我知道每次插入新行时索引都会重新排列。所以没关系。但是,我的目标是将第一次查询时间缩短到几毫秒以下,因为在频繁更新的系统中每次查询花费 2 秒以上会降低性能性能太重了。
查询计划的解释::: 解释语句显示查询几乎遍历了 table 的所有行。所以我猜我有通过索引改进的空间。但是我应该索引什么?
> EXPLAIN SELECT min(created_at), max(created_at) FROM tbl ORDER BY id DESC LIMIT 10000;
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
| 1 | SIMPLE | tbl | index | NULL | created_at | 9 | NULL | 10000545 | Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+----------+-------------+
1 row in set (0.00 sec)
您的原始查询没有 return "the minimum/maximum date from the last 10000 entries"; LIMIT
在聚合函数处理后应用,所以你问的是 "Give me the max/min date, then limit that to the first 10k"... 而且只有一行。
您必须为此使用子查询:
SELECT min(created_at), max(created_at)
FROM (SELECT created_at
FROM my_table
ORDER BY id
LIMIT 10000) subtable;
你最好的选择是 (id, created_at)
上的索引,因为子查询只需要遍历索引,然后 min/max 查询只需要对超过 10k 个元素进行排序。