为什么 MIN() 查询会比 ORDER BY X LIMIT 1 慢？

Question

首先，我看到了：Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?

看起来是同一个问题，但是问题是缺少索引。所以让我澄清一下我的情况。

为了概括，我将简化我的两个查询：

-- min:
SELECT min(id) FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY;
-- exec time: ~0.260 s

-- order-limit:
SELECT id FROM my_table WHERE s_time >= now() - INTERVAL 14 DAY ORDER BY s_time, id LIMIT 1;
-- exec time: ~0.060 s

这里，id是主键，s_time是索引时间戳。

运行 explain format=json，表明这两个查询之间的区别在于订单限制版本有一个 ordering_operation，表示 using_filesort: false。两者显示相同的 query_cost 分析。

现在，我对此的理解是，如果列被索引，那么它在 btree 中排序。并且，这些索引条目具有与主键相关的信息。找到第一个（limit 1）应该是简单的btree遍历，而且很快。

然而，执行MIN(primary_key) FROM foo WHERE indexed_entry > bar，应该以同样的方式处理。这仅仅是innoDb优化不佳的情况吗？

如果使用 LIMIT 有一个特殊的优化案例，其中分析条目数量的内存需求，并且如果可能使用优先级队列而不是快速排序，那么 MIN() 不应该成为同一用例的一部分它使用 LIMIT 1?

explain 差异：

min-case:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "91987.68"
    },
    "table": {
      "table_name": "my_table",
      "access_type": "range",
      "possible_keys": [
        "s_time"
      ],
      "key": "s_time",
      "used_key_parts": [
        "s_time"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 229128,
      "rows_produced_per_join": 229128,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "46162.08",
        "eval_cost": "45825.60",
        "prefix_cost": "91987.68",
        "data_read_per_join": "104M"
      },
      "used_columns": [
        "id",
        "s_time"
      ],
      "attached_condition": "(`db`.`my_table`.`s_time` >= <cache>((now() - interval 14 day)))"
    }
  }
}

order-limit

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "92215.71"
    },
    "ordering_operation": {
      "using_filesort": false,
      "table": {
        "table_name": "my_table",
        "access_type": "range",
        "possible_keys": [
          "s_time"
        ],
        "key": "s_time",
        "used_key_parts": [
          "s_time"
        ],
        "key_length": "4",
        "rows_examined_per_scan": 229696,
        "rows_produced_per_join": 229696,
        "filtered": "100.00",
        "using_index": true,
        "cost_info": {
          "read_cost": "46276.51",
          "eval_cost": "45939.20",
          "prefix_cost": "92215.71",
          "data_read_per_join": "105M"
        },
        "used_columns": [
          "id",
          "s_time"
        ],
        "attached_condition": "(`db`.`my_table`.`started_time` >= <cache>((now() - interval 14 day)))"
      }
    }
  }
}

有趣的相关文档：https://dev.mysql.com/doc/dev/mysql-server/8.0.0/filesort_8cc.html

中的方法 bool check_if_pq_applicable()

DESCRIPTION Given a query like this: SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows; This function tests whether a priority queue should be used to keep the result. Necessary conditions are:

estimate that it is actually cheaper than merge-sort enough memory to store the records.

Answer 1

他们做的事情不同，所以要更加努力。

SELECT  min(id)
    FROM  my_table
    WHERE  s_time >= now() - INTERVAL 14 DAY;

搜索过去两周内的所有项目以找到最低的 id。 INDEX(s_time, id) 会有所帮助。

SELECT  id
    FROM  my_table
    WHERE  s_time >= now() - INTERVAL 14 DAY
    ORDER BY  s_time, id
    LIMIT  1;

如果您有 INDEX(stime, id), then it will look at only one row -- the first one of 14 days ago. No scan. No checking to see if it is the smallestid`。

注意：如果您有 PRIMARY KEY(id), INDEX(stime)，那么该索引实际上是 (stime, id)。

由于您可能按stime顺序插入行，结果可能相同。但是优化器没有知道这些的方法。

为什么 MIN() 查询会比 ORDER BY X LIMIT 1 慢？

Why would MIN() query be slower than ORDER BY X LIMIT 1?

mysql

sql

innodb

query-optimization