查找组中出现次数最多的值
Find most occurring value in a group
我想找到每组中出现次数最多的值。
我尝试使用 top(k)(column) 但出现以下错误:
列 class 不在聚合函数下,也不在 GROUP BY 中。
例如:
如果我有 table test_date with columns(pid, value)
pid, value
----------
1,a
1,b
1,a
1,c
我想要结果:
pid, value
----------
1,a
我试过了SELECT pid,top(1)(value) top_value FROM test_data group by pid
I get the error:
Column value is not under aggregate function and not in GROUP BY
我也试过 anyHeavy()
但它只适用于出现超过一半情况的值
此查询应该可以帮助您:
SELECT
pid,
/*
Decompose the query in parts:
1. groupArray((value, count)): convert the group of rows with the same 'pid' to the array of tuples (value, count)
2. arrayReverseSort: make reverse sorting by 'count' ('x.2' is 'count')
3. [1].1: take the 'value' from the first item of the sorted array
*/
arrayReverseSort(x -> x.2, groupArray((value, count)))[1].1 AS value
FROM
(
SELECT
pid,
value,
count() AS count
FROM test_date
GROUP BY
pid,
value
)
GROUP BY pid
ORDER BY pid ASC
SELECT pid,topK(1)(value) top_value FROM test_data group by pid
我想找到每组中出现次数最多的值。
我尝试使用 top(k)(column) 但出现以下错误: 列 class 不在聚合函数下,也不在 GROUP BY 中。
例如: 如果我有 table test_date with columns(pid, value)
pid, value
----------
1,a
1,b
1,a
1,c
我想要结果:
pid, value
----------
1,a
我试过了SELECT pid,top(1)(value) top_value FROM test_data group by pid
I get the error:
Column value is not under aggregate function and not in GROUP BY
我也试过 anyHeavy()
但它只适用于出现超过一半情况的值
此查询应该可以帮助您:
SELECT
pid,
/*
Decompose the query in parts:
1. groupArray((value, count)): convert the group of rows with the same 'pid' to the array of tuples (value, count)
2. arrayReverseSort: make reverse sorting by 'count' ('x.2' is 'count')
3. [1].1: take the 'value' from the first item of the sorted array
*/
arrayReverseSort(x -> x.2, groupArray((value, count)))[1].1 AS value
FROM
(
SELECT
pid,
value,
count() AS count
FROM test_date
GROUP BY
pid,
value
)
GROUP BY pid
ORDER BY pid ASC
SELECT pid,topK(1)(value) top_value FROM test_data group by pid