为什么这个 ORDER BY in sub-query workaround 不能始终如一地工作？

Question

为了获取某个标识符组合的最近记录，我使用以下查询：

SELECT t1.*
FROM (
    SELECT id, b_id, c_id
    FROM a
    ORDER BY epoch DESC
    LIMIT 18446744073709551615
) AS t1
GROUP BY t1.b_id, t1.c_id

如果 b_id + c_id 的组合有多个记录，那么它总是 select 具有最高值 epoch 的那个（并且作为这样，最晚的时间）。

添加 LIMIT 作为解决方法 to force MariaDB to actually order the results. I successfully use this construction a lot in my application, and so have others。

但是，现在我在我的应用程序中遇到了一个完全相同的查询，我“不小心”在子查询中使用了比绝对必要的列更多的列：

SELECT t1.*
FROM (
    SELECT id, b_id, c_id, and, some, other, columns, ...
    FROM a
    ORDER BY epoch DESC
    LIMIT 18446744073709551615
) AS t1
GROUP BY t1.b_id, t1.c_id

我已经测试了这两个查询。而完全相同的查询，但只更改那些额外的列，会使结果变得不正确。事实上，列数决定了结果。如果我有 <= 28 列，则结果没问题。如果我有 29 列，那么它给出倒数第三的记录（这也是错误的），如果我有 30-36 列，它总是给出倒数第二的记录（36 是 table a).在我的测试中，删除或添加哪个特定列似乎并不重要。

我很难找出添加更多列后行为发生变化的确切原因。另外，也许是偶然，它昨天仍然给出了正确的结果。但是今天结果突然变了，可能是在 table a 中添加了新记录（带有不相关的标识符）之后。我试过使用 EXPLAIN:

# The first query, with columns: id, b_id, c_id
 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    280     Using where; Using temporary; Using filesort
2   DERIVED     a   ALL     NULL    NULL    NULL    NULL    280     Using filesort

# The second query, with columns: id, b_id, c_id, and, some, other, columns, ...
 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    276     Using where; Using temporary; Using filesort
2   DERIVED     a   ALL     NULL    NULL    NULL    NULL    276     Using filesort

但这并没有真正帮助我，除此之外我可以看到 key_len 是不同的。在第二个查询中错误接收的第二个最新记录是 id = 276，使用第一个查询正确检索的实际最新记录是 id = 278。现在总共有 307 行，而昨天可能只有 ~300 行。我不确定如何解释这些结果以了解出了什么问题。有人知道吗？如果不是，我还能做些什么来找出导致这些奇怪结果的原因？

Answer 1

为什么不使用 window 函数而不是这个肮脏的解决方法，它依赖于 MySQL/MariaDB non-standard 关于 group by 的行为？

select *
from (
    select a.*, row_number() over(partition by b_id, c_id order by epoch desc) rn
    from a
) a
where rn = 1

这适用于 MySQL 8.0 和 Maria DB 10.2 或更高版本。在早期版本中，一种替代方法是相关子查询：

select *
from a
where epoch = (select max(a1.epoch) from a a1 where a1.b_id = a.b_id and a1.c_id = a.c_id)

Answer 2

这是一个格式错误的查询，应该会产生语法错误：

SELECT t1.*
FROM (SELECT id, b_id, c_id
      FROM a
      ORDER BY epoch DESC
      LIMIT 18446744073709551615
     ) t1
GROUP BY t1.b_id, t1.c_id;

为什么？您正在选择没有聚合函数的 3 列。但是 group by 只有两列。令人高兴的是，这现在是 MySQL 中的语法错误，使用默认设置。最后！（MySQL 在 8.0 版之前接受了这种 non-standard 语法。）

您可以使用相关子查询做您想做的事情：

select a.*
from a
where a.epoch = (select max(a2.epoch)
                 from a a2
                 where a2.b_id = a.b_id and a2.c_id = a.c_id
                );

使用 a(b_id, c_id, epoch) 上的索引，这可能也比聚合更快——即使它在某些情况下碰巧起作用。

为什么这个 ORDER BY in sub-query workaround 不能始终如一地工作？

Why does this ORDER BY in sub-query workaround not work consistently?

mysql

sql

sql-order-by

greatest-n-per-group

mariadb