Clickhouse select 所有 table 上没有 max() 的最后一条记录
Clickhouse select last record without max() on all table
我在 table 中有数十亿行用于 4k mutable 参数,我需要获取其中 500 个的最后值
我的 table 按天分区并按参数 ID 排序,所以我只需要找到具有所需 ID
的最后一条记录
SELECT max(time)
FROM obj_ntgres.param_values_history
PREWHERE param_id = 4171
工作缓慢:
经过:0.437 秒。处理了 256 万行,5.21 MB(587 万 rows/s., 11.92 MB/s.)
SELECT *
FROM obj_ntgres.param_values_history
PREWHERE param_id = 4171
ORDER BY time DESC
LIMIT 1
较慢:
1 行一组。经过:3.413 秒。处理了 256 万行,5.45 MB(751.21 千 rows/s.,1.60 MB/s.)
Table
CREATE TABLE obj_ntgres.param_values_history (
time DateTime,
param_id UInt16,
param_value Float32,
param_value_quality Decimal(1, 0),
msec Decimal(3, 0)
) ENGINE = MergeTree PARTITION BY toStartOfDay(time)
ORDER BY
param_id SETTINGS index_granularity = 8192
也许您对如何让它更快有一些想法?
我的意思是:找到最后一个元素而不对所有 table
使用 max()
我不明白你所说的 "work bad" 是什么意思。但是如果问题是
select last record with specific where
你可以试试这个(根据你的需要修改):
SELECT
max((time, param_value, param_value_quality, msec)) AS result,
result.2 AS param_value,
result.3 AS param_value_quality
FROM obj_ntgres.param_values_history
PREWHERE param_id = 4171
实际上它仍然需要使用相同的 param_id
.
扫描大量数据的原因
可能的方法很少。在所有情况下,开始时您需要将 time
列添加到 table 排序键:
CREATE TABLE param_values_history (
time DateTime,
param_id UInt16,
param_value Float32,
param_value_quality Decimal(1, 0),
msec Decimal(3, 0)
) ENGINE = MergeTree PARTITION BY toStartOfDay(time)
ORDER BY
(param_id,time) SETTINGS index_granularity = 8192
之后 - 如果您的数据是时间对齐的,即如果您确切知道所有 500 个参数在最后几秒/分钟内都有一些值,您可以添加一个过滤器,例如 AND time > now() - INTERVAL 10 MINUTES
,并且它会工作得非常快(不需要扫描很多行)。
如果您的某些参数没有正则 activity,情况会更糟。
在那种情况下,最快的方法是通过实体化视图缓存每个参数的最后一次,甚至缓存整个最后一行。类似的东西:
CREATE MATERIALIZED VIEW last_positions
Engine=ReplacingMergeTree(max_time)
ORDER BY param_id
PARTITION BY tuple()
AS SELECT param_id, max(time) as max_time
FROM param_values_history
GROUP BY param_id;
SELECT * FROM param_values_history PREWHERE (param_id,time) IN (SELECT param_id, max(max_time) FROM last_positions GROUP BY param_id);
或者:整个最后一行收集在 MV
CREATE MATERIALIZED VIEW last_positions
Engine=ReplacingMergeTree(max_time)
ORDER BY param_id
PARTITION BY tuple()
AS SELECT param_id,
argMax(param_value, time) as _param_value,
argMax(param_value_quality, time) as _param_value_quality,
argMax(param_value, msec) as _msec,
max(time) as max_time
FROM param_values_history
GROUP BY param_id;
SELECT * FROM last_positions FINAL;
我在 table 中有数十亿行用于 4k mutable 参数,我需要获取其中 500 个的最后值 我的 table 按天分区并按参数 ID 排序,所以我只需要找到具有所需 ID
的最后一条记录SELECT max(time)
FROM obj_ntgres.param_values_history
PREWHERE param_id = 4171
工作缓慢: 经过:0.437 秒。处理了 256 万行,5.21 MB(587 万 rows/s., 11.92 MB/s.)
SELECT *
FROM obj_ntgres.param_values_history
PREWHERE param_id = 4171
ORDER BY time DESC
LIMIT 1
较慢: 1 行一组。经过:3.413 秒。处理了 256 万行,5.45 MB(751.21 千 rows/s.,1.60 MB/s.)
Table
CREATE TABLE obj_ntgres.param_values_history (
time DateTime,
param_id UInt16,
param_value Float32,
param_value_quality Decimal(1, 0),
msec Decimal(3, 0)
) ENGINE = MergeTree PARTITION BY toStartOfDay(time)
ORDER BY
param_id SETTINGS index_granularity = 8192
也许您对如何让它更快有一些想法?
我的意思是:找到最后一个元素而不对所有 table
使用 max()我不明白你所说的 "work bad" 是什么意思。但是如果问题是
select last record with specific where
你可以试试这个(根据你的需要修改):
SELECT
max((time, param_value, param_value_quality, msec)) AS result,
result.2 AS param_value,
result.3 AS param_value_quality
FROM obj_ntgres.param_values_history
PREWHERE param_id = 4171
实际上它仍然需要使用相同的 param_id
.
可能的方法很少。在所有情况下,开始时您需要将 time
列添加到 table 排序键:
CREATE TABLE param_values_history (
time DateTime,
param_id UInt16,
param_value Float32,
param_value_quality Decimal(1, 0),
msec Decimal(3, 0)
) ENGINE = MergeTree PARTITION BY toStartOfDay(time)
ORDER BY
(param_id,time) SETTINGS index_granularity = 8192
之后 - 如果您的数据是时间对齐的,即如果您确切知道所有 500 个参数在最后几秒/分钟内都有一些值,您可以添加一个过滤器,例如 AND time > now() - INTERVAL 10 MINUTES
,并且它会工作得非常快(不需要扫描很多行)。
如果您的某些参数没有正则 activity,情况会更糟。
在那种情况下,最快的方法是通过实体化视图缓存每个参数的最后一次,甚至缓存整个最后一行。类似的东西:
CREATE MATERIALIZED VIEW last_positions
Engine=ReplacingMergeTree(max_time)
ORDER BY param_id
PARTITION BY tuple()
AS SELECT param_id, max(time) as max_time
FROM param_values_history
GROUP BY param_id;
SELECT * FROM param_values_history PREWHERE (param_id,time) IN (SELECT param_id, max(max_time) FROM last_positions GROUP BY param_id);
或者:整个最后一行收集在 MV
CREATE MATERIALIZED VIEW last_positions
Engine=ReplacingMergeTree(max_time)
ORDER BY param_id
PARTITION BY tuple()
AS SELECT param_id,
argMax(param_value, time) as _param_value,
argMax(param_value_quality, time) as _param_value_quality,
argMax(param_value, msec) as _msec,
max(time) as max_time
FROM param_values_history
GROUP BY param_id;
SELECT * FROM last_positions FINAL;