数据库：table 中最新样本的动态与静态查找的性能

Question

我有一个 table 收集不同城市的温度样本，这些城市由 city-id 区分。每个样本都与时间戳一起保存。

+----+---------+-------------+---------------------+
| id | city_id | temperature | fetched             |
+----+---------+-------------+---------------------+
|  1 |       1 |          10 | 2016-01-28 00:50:27 |
|  2 |       1 |          12 | 2016-01-27 23:51:45 |
|  3 |       2 |          22 | 2016-01-27 23:52:05 |
|  4 |       2 |          25 | 2016-01-28 00:52:25 |
+----+---------+-------------+---------------------+

如果我想获取所有城市的最新温度，我可以使用自连接 [1]:

SELECT s.* 
FROM sample s 
INNER JOIN (
    SELECT city_id, MAX(fetched) maxFetched
    FROM sample
    GROUP BY city_id
) j 
ON s.city_id = j.city_id AND s.fetched = j.maxFetched;

+----+---------+-------------+---------------------+
| id | city_id | temperature | fetched             |
+----+---------+-------------+---------------------+
|  1 |       1 |          10 | 2016-01-28 00:50:27 |
|  4 |       2 |          25 | 2016-01-28 00:52:25 |
+----+---------+-------------+---------------------+

现在，我想知道 table 增长时的性能。想象一下，我每小时从每个城市收集 1 个样本，例如10个城市，一年后，table将包含10*24*365=87600个样本。 MAX 函数的运行时间会随着输入大小线性增加吗？ IE。有另一个 table 会更好吗？ last_sample，它保存了一个指向最新样本的指针，以便查找每个城市的最新温度只需要恒定的时间（可能会在添加新样本时自动更新）？当在视图中抽象查询时，同样的问题也适用。

SELECT * FROM last_sample;
+------+---------+-----------+
| id   | city_id | sample_id |
+------+---------+-----------+
|    1 |       1 |         1 |
|    2 |       2 |         4 |
+------+---------+-----------+

谢谢！

[1] MySQL get set of data with distinct values

Answer 1

这是 "groupwise max" 问题的一个例子。您的代码是 "not too bad"。有一些示例 in here 可以更快地完成和扩展。并且它讨论了如果您 do/don 不想多行，而当多行具有相同的最大值时该怎么做。

对于你的代码，一定要有这个复合索引：

INDEX(city_id, fetched)

您的子查询将只使用索引。（EXPLAIN SELECT ... 将显示 "Using index"。）而且，我认为，通过索引非常有效地跨越索引找到 "city, max(fetched)" 对。

然后您的 JOIN 将使用相同的索引返回到 table。

更好的方法是摆脱 id（它真的有任何作用吗）因为 (city_id, fetched) 是 "unique"（是吗？）因此，可能是 PRIMARY KEY。在那种情况下，我建议的 INDEX 就不需要了。而且所有的辅助探针都会非常有效（因为使用了 PK）。

底线：

投掷id
更改为 `PRIMARY KEY(city_id，已提取)
（您在我的博客中找不到更好的内容。）
它将线性增长（输出也是如此）

但是...如果您决定将其限制为 fetched BETWEEN ... AND ...，则所有赌注都将取消。（我需要重新考虑事情。）

数据库：table 中最新样本的动态与静态查找的性能

Databases: performance of dynamic vs. static lookups of the latest sample(s) in a table

mysql

database-design

mariadb