为什么我在一种情况下得到了完整的 table 扫描,而在另一种情况下却没有?
Why do I get a full table scan in one case, but not the other?
编辑:在 mysql 版本 5.5.62-38.14-log 上,我遇到了问题,顺便说一句,尽管示例是 运行 在 5.7.27-0ubuntu0.18.04 上。 1 在我的本地机器上。我已将查询中的 UNIX_TIMESTAMP()
更改为 TIMESTAMP()
,但没有任何变化。
有人可以帮忙看看吗?我有一个比较简单的 table:
mysql> CREATE TABLE `game_instance` (
-> `game_instance_id` bigint(20) NOT NULL AUTO_INCREMENT,
-> `game_id` int(11) NOT NULL,
-> `currency_code` varchar(15) DEFAULT NULL,
-> `start_datetime` timestamp,
-> `status` varchar(20) NOT NULL DEFAULT '' COMMENT 'COMING, NMB = No More Bets, RESOLVED, TB= Taking Bets',
-> `created_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-> `end_datetime` datetime DEFAULT NULL,
-> `external_ref` varchar(50) DEFAULT NULL,
-> `game_room_id` int(11) DEFAULT NULL,
-> PRIMARY KEY (`game_instance_id`,`start_datetime`),
-> KEY `GI_IDX4` (`external_ref`),
-> KEY `GI_IDX5` (`game_id`,`status`),
-> KEY `game_instance_status` (`status`),
-> KEY `game_instance_end_datetime` (`end_datetime`),
-> KEY `game_instance_start_datetime` (`start_datetime`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=118386942 DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.14 sec)
mysql> explain select * from game_instance where start_datetime >= unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | game_instance | NULL | ALL | game_instance_start_datetime | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
我在 start_datetime
上有一个索引,但根据 explain
,我仍然得到完整的 table 扫描。
但是:
mysql> create table ex1(
-> id bigint(20),
-> start_datetime timestamp,
-> primary key (id,start_datetime),
-> key (start_datetime)
-> );
Query OK, 0 rows affected (0.02 sec)
mysql> explain select * from ex1 where start_datetime>=unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | ex1 | NULL | index | start_datetime | start_datetime | 4 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
1 row in set, 3 warnings (0.00 sec)
警告是:
mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1292 | Incorrect datetime value: '1563663600' for column 'start_datetime' at row 1 |
| Warning | 1292 | Incorrect datetime value: '1563663600' for column 'start_datetime' at row 1 |
| Note | 1003 | /* select#1 */ select `ex`.`ex1`.`id` AS `id`,`ex`.`ex1`.`start_datetime` AS `start_datetime` from `ex`.`ex1` where (`ex`.`ex1`.`start_datetime` >= <cache>(unix_timestamp(concat((curdate() - interval 30 day),' ','00:00:00')))) |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
这似乎表明 start_datetime
是在后台默默转换的,这可以解释为什么不使用索引,但为什么在两个查询中都没有发生呢? (作为推论,我如何将我的日期字符串转换为 MySQL TIMESTAMP 是什么?)
编辑 2:
我已经 运行 优化了 table,正如评论中所建议的(我没有 运行 分析,因为它似乎已经这样做了):
mysql> optimize table game_instance;
+-----------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+----------+----------+-------------------------------------------------------------------+
| gameiom.game_instance | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| gameiom.game_instance | optimize | status | OK |
+-----------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (21 min 31.80 sec)
然而,这没有什么区别:
mysql> explain select * from game_instance
where start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and
start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | game_instance | ALL | game_instance_start_datetime | NULL | NULL | NULL | 19065747 | Using where |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)
这是一个真正的问题,因为 table 是 19m 行(不是我之前所说的 11m)。
有时,查询规划器会根据索引中值的数量和分布的统计信息来决定是扫描整个 table 还是使用索引。有时它会猜测完整的 table 扫描将比 table 查找占用更少的 CPU 和 IO 资源。
当 table 的行数较少时,查询规划器的选择通常与直觉不符。在花费大量时间试图理解 EXPLAIN
输出之前,请确保至少有几千行。
此外,查询规划器在每个 MySQL 版本中都会变得更好。
执行 OPTIMIZE TABLE game_instance
清理您的 table,尤其是当您插入了很多行时。
然后ANALYZE TABLE game_instance
重新计算查询规划器使用的统计数据。
顺便说一句,
where start_datetime>=unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
与
完全相同
where start_datetime >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
MySQL 知道如何在 TIMESTAMP 过滤器中直接使用日期计算的结果,并且 UNIX_TIMESTAMP() 产生整数,而不是 TIMESTAMP。
关于您的无效时间戳警告,我可以建议您再问一个问题吗?请在问题中包含您的时区设置。
O. Jones 的回答是正确的,但让我添加一些注释,说明我做了什么来找出答案。我看到的是这个,我无法理解:
mysql> explain extended
select * from game_instance
where
start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and
start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | game_instance | ALL | game_instance_start_datetime | NULL | NULL | NULL | 18741262 | 50.00 | Using where |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
所以,我发现你可以强制 MySQL 使用索引,这给了我:
mysql> explain extended select * from game_instance force index (game_instance_start_datetime) where start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | game_instance | range | game_instance_start_datetime | game_instance_start_datetime | 4 | NULL | 9391936 | 100.00 | Using where |
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
IOW,使用索引选择 table 中所有行的大约一半,现在 filtered
列有意义:它是被丢弃的行的百分比,因为它们不不符合条件,这就是 MySQL 不使用索引的原因:它效率较低,因为您需要在读取索引和在 table.
中查找地址之间交替
编辑:在 mysql 版本 5.5.62-38.14-log 上,我遇到了问题,顺便说一句,尽管示例是 运行 在 5.7.27-0ubuntu0.18.04 上。 1 在我的本地机器上。我已将查询中的 UNIX_TIMESTAMP()
更改为 TIMESTAMP()
,但没有任何变化。
有人可以帮忙看看吗?我有一个比较简单的 table:
mysql> CREATE TABLE `game_instance` (
-> `game_instance_id` bigint(20) NOT NULL AUTO_INCREMENT,
-> `game_id` int(11) NOT NULL,
-> `currency_code` varchar(15) DEFAULT NULL,
-> `start_datetime` timestamp,
-> `status` varchar(20) NOT NULL DEFAULT '' COMMENT 'COMING, NMB = No More Bets, RESOLVED, TB= Taking Bets',
-> `created_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-> `end_datetime` datetime DEFAULT NULL,
-> `external_ref` varchar(50) DEFAULT NULL,
-> `game_room_id` int(11) DEFAULT NULL,
-> PRIMARY KEY (`game_instance_id`,`start_datetime`),
-> KEY `GI_IDX4` (`external_ref`),
-> KEY `GI_IDX5` (`game_id`,`status`),
-> KEY `game_instance_status` (`status`),
-> KEY `game_instance_end_datetime` (`end_datetime`),
-> KEY `game_instance_start_datetime` (`start_datetime`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=118386942 DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.14 sec)
mysql> explain select * from game_instance where start_datetime >= unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | game_instance | NULL | ALL | game_instance_start_datetime | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+---------------+------------+------+------------------------------+------+---------+------+------+----------+-------------+
1 row in set, 3 warnings (0.00 sec)
我在 start_datetime
上有一个索引,但根据 explain
,我仍然得到完整的 table 扫描。
但是:
mysql> create table ex1(
-> id bigint(20),
-> start_datetime timestamp,
-> primary key (id,start_datetime),
-> key (start_datetime)
-> );
Query OK, 0 rows affected (0.02 sec)
mysql> explain select * from ex1 where start_datetime>=unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | ex1 | NULL | index | start_datetime | start_datetime | 4 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+----------------+----------------+---------+------+------+----------+--------------------------+
1 row in set, 3 warnings (0.00 sec)
警告是:
mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1292 | Incorrect datetime value: '1563663600' for column 'start_datetime' at row 1 |
| Warning | 1292 | Incorrect datetime value: '1563663600' for column 'start_datetime' at row 1 |
| Note | 1003 | /* select#1 */ select `ex`.`ex1`.`id` AS `id`,`ex`.`ex1`.`start_datetime` AS `start_datetime` from `ex`.`ex1` where (`ex`.`ex1`.`start_datetime` >= <cache>(unix_timestamp(concat((curdate() - interval 30 day),' ','00:00:00')))) |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)
这似乎表明 start_datetime
是在后台默默转换的,这可以解释为什么不使用索引,但为什么在两个查询中都没有发生呢? (作为推论,我如何将我的日期字符串转换为 MySQL TIMESTAMP 是什么?)
编辑 2:
我已经 运行 优化了 table,正如评论中所建议的(我没有 运行 分析,因为它似乎已经这样做了):
mysql> optimize table game_instance;
+-----------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------+----------+----------+-------------------------------------------------------------------+
| gameiom.game_instance | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| gameiom.game_instance | optimize | status | OK |
+-----------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (21 min 31.80 sec)
然而,这没有什么区别:
mysql> explain select * from game_instance
where start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and
start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | game_instance | ALL | game_instance_start_datetime | NULL | NULL | NULL | 19065747 | Using where |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)
这是一个真正的问题,因为 table 是 19m 行(不是我之前所说的 11m)。
有时,查询规划器会根据索引中值的数量和分布的统计信息来决定是扫描整个 table 还是使用索引。有时它会猜测完整的 table 扫描将比 table 查找占用更少的 CPU 和 IO 资源。
当 table 的行数较少时,查询规划器的选择通常与直觉不符。在花费大量时间试图理解 EXPLAIN
输出之前,请确保至少有几千行。
此外,查询规划器在每个 MySQL 版本中都会变得更好。
执行 OPTIMIZE TABLE game_instance
清理您的 table,尤其是当您插入了很多行时。
然后ANALYZE TABLE game_instance
重新计算查询规划器使用的统计数据。
顺便说一句,
where start_datetime>=unix_timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00'));
与
完全相同 where start_datetime >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
MySQL 知道如何在 TIMESTAMP 过滤器中直接使用日期计算的结果,并且 UNIX_TIMESTAMP() 产生整数,而不是 TIMESTAMP。
关于您的无效时间戳警告,我可以建议您再问一个问题吗?请在问题中包含您的时区设置。
O. Jones 的回答是正确的,但让我添加一些注释,说明我做了什么来找出答案。我看到的是这个,我无法理解:
mysql> explain extended
select * from game_instance
where
start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and
start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | game_instance | ALL | game_instance_start_datetime | NULL | NULL | NULL | 18741262 | 50.00 | Using where |
+----+-------------+---------------+------+------------------------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
所以,我发现你可以强制 MySQL 使用索引,这给了我:
mysql> explain extended select * from game_instance force index (game_instance_start_datetime) where start_datetime >= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 30 DAY), ' ', '00:00:00')) and start_datetime <= timestamp(CONCAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), ' ', '23:59:59'));
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | game_instance | range | game_instance_start_datetime | game_instance_start_datetime | 4 | NULL | 9391936 | 100.00 | Using where |
+----+-------------+---------------+-------+------------------------------+------------------------------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
IOW,使用索引选择 table 中所有行的大约一半,现在 filtered
列有意义:它是被丢弃的行的百分比,因为它们不不符合条件,这就是 MySQL 不使用索引的原因:它效率较低,因为您需要在读取索引和在 table.