为什么 MIN() 查询会比 ORDER BY X LIMIT 1 慢?
Why would MIN() query be slower than ORDER BY X LIMIT 1?
首先,我看到了:Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
看起来是同一个问题,但是问题是缺少索引。所以让我澄清一下我的情况。
为了概括,我将简化我的两个查询:
-- min:
SELECT min(id) FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY;
-- exec time: ~0.260 s
-- order-limit:
SELECT id FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY ORDER BY s_time, id LIMIT 1;
-- exec time: ~0.060 s
这里,id
是主键,s_time
是索引时间戳。
运行 explain format=json
,表明这两个查询之间的区别在于订单限制版本有一个 ordering_operation,表示 using_filesort: false
。两者显示相同的 query_cost
分析。
现在,我对此的理解是,如果列被索引,那么它在 btree 中排序。并且,这些索引条目具有与主键相关的信息。找到第一个(limit 1)应该是简单的btree遍历,而且很快。
然而,执行MIN(primary_key) FROM foo WHERE indexed_entry > bar
,应该以同样的方式处理。这仅仅是innoDb优化不佳的情况吗?
如果使用 LIMIT 有一个特殊的优化案例,其中分析条目数量的内存需求,并且如果可能使用优先级队列而不是快速排序,那么 MIN()
不应该成为同一用例的一部分它使用 LIMIT 1
?
explain
差异:
min-case:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "91987.68"
},
"table": {
"table_name": "my_table",
"access_type": "range",
"possible_keys": [
"s_time"
],
"key": "s_time",
"used_key_parts": [
"s_time"
],
"key_length": "4",
"rows_examined_per_scan": 229128,
"rows_produced_per_join": 229128,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "46162.08",
"eval_cost": "45825.60",
"prefix_cost": "91987.68",
"data_read_per_join": "104M"
},
"used_columns": [
"id",
"s_time"
],
"attached_condition": "(`db`.`my_table`.`s_time` >= <cache>((now() - interval 14 day)))"
}
}
}
order-limit
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "92215.71"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "my_table",
"access_type": "range",
"possible_keys": [
"s_time"
],
"key": "s_time",
"used_key_parts": [
"s_time"
],
"key_length": "4",
"rows_examined_per_scan": 229696,
"rows_produced_per_join": 229696,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "46276.51",
"eval_cost": "45939.20",
"prefix_cost": "92215.71",
"data_read_per_join": "105M"
},
"used_columns": [
"id",
"s_time"
],
"attached_condition": "(`db`.`my_table`.`started_time` >= <cache>((now() - interval 14 day)))"
}
}
}
}
有趣的相关文档:https://dev.mysql.com/doc/dev/mysql-server/8.0.0/filesort_8cc.html
中的方法 bool check_if_pq_applicable()
DESCRIPTION Given a query like this: SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows; This function tests whether a priority queue should be used to keep the result. Necessary conditions are:
estimate that it is actually cheaper than merge-sort
enough memory to store the records.
他们做的事情不同,所以要更加努力。
SELECT min(id)
FROM my_table
WHERE s_time >= now() - INTERVAL 14 DAY;
搜索过去两周内的 所有 项目以找到最低的 id
。 INDEX(s_time, id)
会有所帮助。
SELECT id
FROM my_table
WHERE s_time >= now() - INTERVAL 14 DAY
ORDER BY s_time, id
LIMIT 1;
如果您有 INDEX(stime, id), then it will look at only one row -- the first one of 14 days ago. No scan. No checking to see if it is the smallest
id`。
注意:如果您有 PRIMARY KEY(id), INDEX(stime)
,那么该索引实际上是 (stime, id)
。
由于您可能按stime
顺序插入行,结果可能相同。但是优化器没有知道这些的方法。
首先,我看到了:Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
看起来是同一个问题,但是问题是缺少索引。所以让我澄清一下我的情况。
为了概括,我将简化我的两个查询:
-- min:
SELECT min(id) FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY;
-- exec time: ~0.260 s
-- order-limit:
SELECT id FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY ORDER BY s_time, id LIMIT 1;
-- exec time: ~0.060 s
这里,id
是主键,s_time
是索引时间戳。
运行 explain format=json
,表明这两个查询之间的区别在于订单限制版本有一个 ordering_operation,表示 using_filesort: false
。两者显示相同的 query_cost
分析。
现在,我对此的理解是,如果列被索引,那么它在 btree 中排序。并且,这些索引条目具有与主键相关的信息。找到第一个(limit 1)应该是简单的btree遍历,而且很快。
然而,执行MIN(primary_key) FROM foo WHERE indexed_entry > bar
,应该以同样的方式处理。这仅仅是innoDb优化不佳的情况吗?
如果使用 LIMIT 有一个特殊的优化案例,其中分析条目数量的内存需求,并且如果可能使用优先级队列而不是快速排序,那么 MIN()
不应该成为同一用例的一部分它使用 LIMIT 1
?
explain
差异:
min-case:
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "91987.68"
},
"table": {
"table_name": "my_table",
"access_type": "range",
"possible_keys": [
"s_time"
],
"key": "s_time",
"used_key_parts": [
"s_time"
],
"key_length": "4",
"rows_examined_per_scan": 229128,
"rows_produced_per_join": 229128,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "46162.08",
"eval_cost": "45825.60",
"prefix_cost": "91987.68",
"data_read_per_join": "104M"
},
"used_columns": [
"id",
"s_time"
],
"attached_condition": "(`db`.`my_table`.`s_time` >= <cache>((now() - interval 14 day)))"
}
}
}
order-limit
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "92215.71"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "my_table",
"access_type": "range",
"possible_keys": [
"s_time"
],
"key": "s_time",
"used_key_parts": [
"s_time"
],
"key_length": "4",
"rows_examined_per_scan": 229696,
"rows_produced_per_join": 229696,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "46276.51",
"eval_cost": "45939.20",
"prefix_cost": "92215.71",
"data_read_per_join": "105M"
},
"used_columns": [
"id",
"s_time"
],
"attached_condition": "(`db`.`my_table`.`started_time` >= <cache>((now() - interval 14 day)))"
}
}
}
}
有趣的相关文档:https://dev.mysql.com/doc/dev/mysql-server/8.0.0/filesort_8cc.html
中的方法bool check_if_pq_applicable()
DESCRIPTION Given a query like this: SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows; This function tests whether a priority queue should be used to keep the result. Necessary conditions are:
estimate that it is actually cheaper than merge-sort enough memory to store the records.
他们做的事情不同,所以要更加努力。
SELECT min(id)
FROM my_table
WHERE s_time >= now() - INTERVAL 14 DAY;
搜索过去两周内的 所有 项目以找到最低的 id
。 INDEX(s_time, id)
会有所帮助。
SELECT id
FROM my_table
WHERE s_time >= now() - INTERVAL 14 DAY
ORDER BY s_time, id
LIMIT 1;
如果您有 INDEX(stime, id), then it will look at only one row -- the first one of 14 days ago. No scan. No checking to see if it is the smallest
id`。
注意:如果您有 PRIMARY KEY(id), INDEX(stime)
,那么该索引实际上是 (stime, id)
。
由于您可能按stime
顺序插入行,结果可能相同。但是优化器没有知道这些的方法。