聚合过度聚合
Aggregation over aggregation
我有时间序列数据(证券交易所交易),我需要按时间间隔汇总它们:一分钟、5 分钟、15 分钟等。
高级时间框架可以从次要时间框架计算,即 5 x 1 分钟 -> 5 分钟。
我做了MATERIALIZED VIEW, AggregatingMergeTree
,成功计算了m1,like
maxState(price) as price_high, countState(item_id) as trades_count
但我不知道如何制定下一个时间表。如果我在下一个视图中使用 maxMerge
我 return 一个不正确的结果,这很好,因为文档说我必须在 AggregatingMergeTree
中使用 -state
,当我使用 -State
在 m5 中它也抱怨错误。
我想构建一系列物化视图,其中次要视图通过管道向高级视图提供来自交易的更新
更新(SQL):
CREATE MATERIALIZED VIEW IF NOT EXISTS candle_m1_state
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(toDateTime(timestamp_close_m1/1000))
ORDER BY (platform_id, symbol, timestamp_close_m1)
POPULATE AS
select
platform_id as platform_id,
symbol as symbol,
'1m' as `candle_interval`,
1000*toUnixTimestamp(toStartOfMinute(toDateTime(timestamp/1000))) as timestamp_m1,
1000*toUnixTimestamp(addMinutes(toStartOfMinute(toDateTime(timestamp/1000)), 1)) as timestamp_close_m1,
...
minState(price) as price_low,
countState(item_id) as trades_count
from trade
group by platform_id, symbol, timestamp_m1, timestamp_close_m1, `candle_interval`
order by timestamp_close_m1;
/*The one below definitely wrong due to -State suffix*/
CREATE MATERIALIZED VIEW IF NOT EXISTS candle_m5_test
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(toDateTime(timestamp_close_m5 / 1000))
ORDER BY (platform_id, symbol, timestamp_close_m5) SETTINGS index_granularity = 8192
POPULATE AS
SELECT platform_id, symbol, '5m' AS candle_interval,
1000 * toUnixTimestamp(toStartOfFiveMinute(toDateTime(timestamp_m1 / 1000))) AS timestamp_m5,
1000 * toUnixTimestamp(addMinutes(toStartOfFiveMinute(toDateTime(timestamp_m1 / 1000)), 5)) AS timestamp_close_m5,
...
minState(price_low) AS price_low,
countState(trades_count) AS trades_count
FROM candle_m1_state
GROUP BY platform_id, symbol, timestamp_m5, timestamp_close_m5
ORDER BY platform_id ASC, symbol ASC, timestamp_close_m5 ASC;
我不会尝试链接视图。我会为每个聚合做一个视图。
还要记住 MATERIALIZED VIEW
与其说是视图,不如说是触发。
我会推荐:
CREATE MATERIALIZED VIEW
stream__source__target_5m TO target_5m
AS
SELECT ...
CREATE MATERIALIZED VIEW
stream__source__target_1m TO target_1m
AS
SELECT ...
等等
其中 target_xm
是您的目标表。
很明显,select-物化视图链的查询时间我想坚持使用该解决方案,而不是从原始数据为每个时间框架 (TF) 聚合创建视图。
所以解决方案是:
原始原始数据->TF1物化视图(AggregatingMergeTree,-State后缀)->TF2(来自TF1)(AggregatingMergeTree,-MergeState 后缀)
然后查询任何 TF1、TF2.. 带有 -Merge 后缀
我有时间序列数据(证券交易所交易),我需要按时间间隔汇总它们:一分钟、5 分钟、15 分钟等。 高级时间框架可以从次要时间框架计算,即 5 x 1 分钟 -> 5 分钟。
我做了MATERIALIZED VIEW, AggregatingMergeTree
,成功计算了m1,like
maxState(price) as price_high, countState(item_id) as trades_count
但我不知道如何制定下一个时间表。如果我在下一个视图中使用 maxMerge
我 return 一个不正确的结果,这很好,因为文档说我必须在 AggregatingMergeTree
中使用 -state
,当我使用 -State
在 m5 中它也抱怨错误。
我想构建一系列物化视图,其中次要视图通过管道向高级视图提供来自交易的更新
更新(SQL):
CREATE MATERIALIZED VIEW IF NOT EXISTS candle_m1_state
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(toDateTime(timestamp_close_m1/1000))
ORDER BY (platform_id, symbol, timestamp_close_m1)
POPULATE AS
select
platform_id as platform_id,
symbol as symbol,
'1m' as `candle_interval`,
1000*toUnixTimestamp(toStartOfMinute(toDateTime(timestamp/1000))) as timestamp_m1,
1000*toUnixTimestamp(addMinutes(toStartOfMinute(toDateTime(timestamp/1000)), 1)) as timestamp_close_m1,
...
minState(price) as price_low,
countState(item_id) as trades_count
from trade
group by platform_id, symbol, timestamp_m1, timestamp_close_m1, `candle_interval`
order by timestamp_close_m1;
/*The one below definitely wrong due to -State suffix*/
CREATE MATERIALIZED VIEW IF NOT EXISTS candle_m5_test
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(toDateTime(timestamp_close_m5 / 1000))
ORDER BY (platform_id, symbol, timestamp_close_m5) SETTINGS index_granularity = 8192
POPULATE AS
SELECT platform_id, symbol, '5m' AS candle_interval,
1000 * toUnixTimestamp(toStartOfFiveMinute(toDateTime(timestamp_m1 / 1000))) AS timestamp_m5,
1000 * toUnixTimestamp(addMinutes(toStartOfFiveMinute(toDateTime(timestamp_m1 / 1000)), 5)) AS timestamp_close_m5,
...
minState(price_low) AS price_low,
countState(trades_count) AS trades_count
FROM candle_m1_state
GROUP BY platform_id, symbol, timestamp_m5, timestamp_close_m5
ORDER BY platform_id ASC, symbol ASC, timestamp_close_m5 ASC;
我不会尝试链接视图。我会为每个聚合做一个视图。
还要记住 MATERIALIZED VIEW
与其说是视图,不如说是触发。
我会推荐:
CREATE MATERIALIZED VIEW
stream__source__target_5m TO target_5m
AS
SELECT ...
CREATE MATERIALIZED VIEW
stream__source__target_1m TO target_1m
AS
SELECT ...
等等
其中 target_xm
是您的目标表。
很明显,select-物化视图链的查询时间我想坚持使用该解决方案,而不是从原始数据为每个时间框架 (TF) 聚合创建视图。
所以解决方案是:
原始原始数据->TF1物化视图(AggregatingMergeTree,-State后缀)->TF2(来自TF1)(AggregatingMergeTree,-MergeState 后缀)
然后查询任何 TF1、TF2.. 带有 -Merge 后缀