我可以在不通过物化视图的情况下在 AggregatingMergeTree 中插入聚合状态吗
Can I insert an aggregation state in an AggregatingMergeTree without going through a materialized view
我使用 AggregatingMergeTree 已有一段时间了,方法是通过从原始数据生成聚合状态的物化视图来填充它们 table。这很好用。
我想知道是否有一种方法可以从我收到的数组生成聚合状态,而无需经过每个数据点包含一行的 table。
示例:
我发现我可以通过这种方式从数组生成聚合状态:
SELECT countState([1,2,3,4,5])
SELECT uniqState(['user1', 'user2', 'user3'])
# Does not work with quantiles
虽然我没有设法将它们直接插入到聚合合并树中。
示例:
CREATE TABLE state_test_agg(
id UInt64,
count_state AggregateFunction(count, UInt32),
uniq_state AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (id)
INSERT INTO state_test_agg (id, count_state, uniq_state)
VALUES (1, countState([1,2,3,4]), uniqState(['a', 'b', 'b']))
Code: 184. DB::Exception: Aggregate function countState([1, 2, 3, 4]) is found in wrong place in query
如果相反,我首先将存储桶(数组)存储在 table 中,然后尝试将它们移动到聚合合并树中,我得到
CREATE TABLE state_test_raw(
id UInt64,
count_bucket Array(UInt64),
uniq_bucket Array(String)
)
ENGINE MergeTree
ORDER BY id
INSERT INTO state_test_raw (
id,
count_bucket,
uniq_bucket)
VALUES
(1, [1,2,3,4,5,5], ['a', 'b', 'c'])
-- so far so good
INSERT INTO state_test_agg
VALUES (id, count_state, uniq_state)
SELECT
id,
countState(count_bucket) as count_state,
uniqState(uniq_bucket) as uniq_state
FROM state_test_raw
GROUP BY id
Code: 47. DB::Exception: Missing columns: 'id' while processing query: 'id', required columns: 'id', source columns: '_dummy'
尚不清楚,但我认为这是一个问题,因为 Clickhouse 无法知道 id 列是否唯一。
有没有办法插入从数组而不是通过物化视图生成的聚合状态?
场景:
我想记录指标并在其上构建汇总 tables。简单的方法是将每个数据点存储为 MergeTree 中的一行,然后创建一个物化视图,通过填充聚合合并树来生成汇总。
示例:
timestamp, response_time, 200
timestamp2, response_time, 205
...
尽管这些数据点已经在源(因此是数组)处进行了分桶。
time_bucket_1, response_time, [200, 205]
我可以扩展它们并为每个数据点写一行,但似乎我可以通过跳过该步骤并尝试直接从水桶
谢谢。
菲利波
考虑使用 arrayReduce 将项目数组转换为聚合状态:
WITH
arrayReduce('countState', [1, 2, 3, 4, 5]) AS state_1,
arrayReduce('uniqState', ['user1', 'user2', 'user3', 'user1']) AS state_2
SELECT
countMerge(state_1) AS merge_1,
uniqMerge(state_2) AS merge_2
/*
┌─merge_1─┬─merge_2─┐
│ 5 │ 3 │
└─────────┴─────────┘
*/
INSERT INTO state_test_agg (id, count_state, uniq_state)
VALUES
(1,
arrayReduce('countState', [toUInt32(1),2,3,4]),
arrayReduce('uniqState', ['a', 'b', 'b'])),
(1,
arrayReduce('countState', [toUInt32(5),5]),
arrayReduce('uniqState', ['a', 'b', 'b','c']));
SELECT
id,
countMerge(count_state) AS c,
uniqMerge(uniq_state) AS u
FROM state_test_agg
GROUP BY id
/*
┌─id─┬─c─┬─u─┐
│ 1 │ 6 │ 3 │
└────┴───┴───┘
*/
我使用 AggregatingMergeTree 已有一段时间了,方法是通过从原始数据生成聚合状态的物化视图来填充它们 table。这很好用。
我想知道是否有一种方法可以从我收到的数组生成聚合状态,而无需经过每个数据点包含一行的 table。 示例:
我发现我可以通过这种方式从数组生成聚合状态:
SELECT countState([1,2,3,4,5])
SELECT uniqState(['user1', 'user2', 'user3'])
# Does not work with quantiles
虽然我没有设法将它们直接插入到聚合合并树中。 示例:
CREATE TABLE state_test_agg(
id UInt64,
count_state AggregateFunction(count, UInt32),
uniq_state AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (id)
INSERT INTO state_test_agg (id, count_state, uniq_state)
VALUES (1, countState([1,2,3,4]), uniqState(['a', 'b', 'b']))
Code: 184. DB::Exception: Aggregate function countState([1, 2, 3, 4]) is found in wrong place in query
如果相反,我首先将存储桶(数组)存储在 table 中,然后尝试将它们移动到聚合合并树中,我得到
CREATE TABLE state_test_raw(
id UInt64,
count_bucket Array(UInt64),
uniq_bucket Array(String)
)
ENGINE MergeTree
ORDER BY id
INSERT INTO state_test_raw (
id,
count_bucket,
uniq_bucket)
VALUES
(1, [1,2,3,4,5,5], ['a', 'b', 'c'])
-- so far so good
INSERT INTO state_test_agg
VALUES (id, count_state, uniq_state)
SELECT
id,
countState(count_bucket) as count_state,
uniqState(uniq_bucket) as uniq_state
FROM state_test_raw
GROUP BY id
Code: 47. DB::Exception: Missing columns: 'id' while processing query: 'id', required columns: 'id', source columns: '_dummy'
尚不清楚,但我认为这是一个问题,因为 Clickhouse 无法知道 id 列是否唯一。
有没有办法插入从数组而不是通过物化视图生成的聚合状态?
场景: 我想记录指标并在其上构建汇总 tables。简单的方法是将每个数据点存储为 MergeTree 中的一行,然后创建一个物化视图,通过填充聚合合并树来生成汇总。 示例:
timestamp, response_time, 200
timestamp2, response_time, 205
...
尽管这些数据点已经在源(因此是数组)处进行了分桶。
time_bucket_1, response_time, [200, 205]
我可以扩展它们并为每个数据点写一行,但似乎我可以通过跳过该步骤并尝试直接从水桶
谢谢。 菲利波
考虑使用 arrayReduce 将项目数组转换为聚合状态:
WITH
arrayReduce('countState', [1, 2, 3, 4, 5]) AS state_1,
arrayReduce('uniqState', ['user1', 'user2', 'user3', 'user1']) AS state_2
SELECT
countMerge(state_1) AS merge_1,
uniqMerge(state_2) AS merge_2
/*
┌─merge_1─┬─merge_2─┐
│ 5 │ 3 │
└─────────┴─────────┘
*/
INSERT INTO state_test_agg (id, count_state, uniq_state)
VALUES
(1,
arrayReduce('countState', [toUInt32(1),2,3,4]),
arrayReduce('uniqState', ['a', 'b', 'b'])),
(1,
arrayReduce('countState', [toUInt32(5),5]),
arrayReduce('uniqState', ['a', 'b', 'b','c']));
SELECT
id,
countMerge(count_state) AS c,
uniqMerge(uniq_state) AS u
FROM state_test_agg
GROUP BY id
/*
┌─id─┬─c─┬─u─┐
│ 1 │ 6 │ 3 │
└────┴───┴───┘
*/