Postgre 通过重复属性进行双重分组
Postgree double group by repeating attribute
我有 table 一些列:id、user_id、message_id、message_type;例如:
id: 1, user_id: 1, message_id: 4, message_type: 'Warning'
id: 2, user_id: 1, message_id: 5, message_type: 'Warning'
id: 3, user_id: 1, message_id: 6, message_type: 'Warning'
id: 4, user_id: 2, message_id: 4, message_type: 'Error'
id: 5, user_id: 2, message_id: 1, message_type: 'Exception'
id: 6, user_id: 1, message_id: 2, message_type: 'Exception'
id: 7, user_id: 1, message_id: 3, message_type: 'Exception'
id: 8, user_id: 2, message_id: 4, message_type: 'Exception'
我想在社交网络中获得像新闻这样的分组结果。在 user_id 和 message_type 列上重复 message_type。并且需要 LIMIT 20 ORDER BY id DESC。
示例:
id: 8, user_id: 2, message_id: 4, message_type: 'Exception'
id: {6,7} user_id: 1, message_id: {2,3}, message_type: 'Exception'
id: 5, user_id: 2, message_id: 1, message_type: 'Exception'
id: 4, user_id: 2, message_id: 4, message_type: 'Error'
id: {1, 2, 3}, user_id: 1, message_id: {4, 5, 6}, message_type: 'Warning'
如何以最佳性能做到这一点?
array_agg
函数应该可以解决问题:
SELECT user_id,
message_type,
ARRAY_AGG (DISTINCT id),
ARRAY_AGG (DISTINCT message_id)
FROM mytable
GROUP BY user_id, message_type
我只找到了一种方式:
- 用window函数
lead()
查找字典被修改的时刻(user, message type)
- 使用window函数
sum()
为每个新字典设置序号
- 按顺序分组,select你需要什么:
正在检查:
create table test (
id serial primary key,
user_id integer,
message_id integer,
message_type varchar
);
insert into test (user_id, message_id, message_type)
values
(1, 4, 'Warning'),
(1, 5, 'Warning'),
(1, 6, 'Warning'),
(2, 4, 'Error'),
(2, 1, 'Exception'),
(1, 2, 'Exception'),
(1, 3, 'Exception'),
(2, 4, 'Exception')
;
select
array_agg(grouped.id) as record_ids,
grouped.user_id,
array_agg(grouped.message_id) as message_ids,
grouped.message_type
from (
select changed.*,
sum(changed.changed) over (order by changed.id desc) as group_n
from (
select tt.*,
case when lag((user_id, message_type)) over (order by tt.id desc) is distinct from (user_id, message_type) then 1 else 0 end as changed
from test tt
) changed
order by id desc
) grouped
group by grouped.group_n, grouped.user_id, grouped.message_type
order by grouped.group_n
;
结果:
record_ids | user_id | message_ids | message_type
------------+---------+-------------+--------------
{8} | 2 | {4} | Exception
{7,6} | 1 | {3,2} | Exception
{5} | 2 | {1} | Exception
{4} | 2 | {4} | Error
{3,2,1} | 1 | {6,5,4} | Warning
(5 rows)
我有 table 一些列:id、user_id、message_id、message_type;例如:
id: 1, user_id: 1, message_id: 4, message_type: 'Warning'
id: 2, user_id: 1, message_id: 5, message_type: 'Warning'
id: 3, user_id: 1, message_id: 6, message_type: 'Warning'
id: 4, user_id: 2, message_id: 4, message_type: 'Error'
id: 5, user_id: 2, message_id: 1, message_type: 'Exception'
id: 6, user_id: 1, message_id: 2, message_type: 'Exception'
id: 7, user_id: 1, message_id: 3, message_type: 'Exception'
id: 8, user_id: 2, message_id: 4, message_type: 'Exception'
我想在社交网络中获得像新闻这样的分组结果。在 user_id 和 message_type 列上重复 message_type。并且需要 LIMIT 20 ORDER BY id DESC。 示例:
id: 8, user_id: 2, message_id: 4, message_type: 'Exception'
id: {6,7} user_id: 1, message_id: {2,3}, message_type: 'Exception'
id: 5, user_id: 2, message_id: 1, message_type: 'Exception'
id: 4, user_id: 2, message_id: 4, message_type: 'Error'
id: {1, 2, 3}, user_id: 1, message_id: {4, 5, 6}, message_type: 'Warning'
如何以最佳性能做到这一点?
array_agg
函数应该可以解决问题:
SELECT user_id,
message_type,
ARRAY_AGG (DISTINCT id),
ARRAY_AGG (DISTINCT message_id)
FROM mytable
GROUP BY user_id, message_type
我只找到了一种方式:
- 用window函数
lead()
查找字典被修改的时刻(user, message type)
- 使用window函数
sum()
为每个新字典设置序号 - 按顺序分组,select你需要什么:
正在检查:
create table test (
id serial primary key,
user_id integer,
message_id integer,
message_type varchar
);
insert into test (user_id, message_id, message_type)
values
(1, 4, 'Warning'),
(1, 5, 'Warning'),
(1, 6, 'Warning'),
(2, 4, 'Error'),
(2, 1, 'Exception'),
(1, 2, 'Exception'),
(1, 3, 'Exception'),
(2, 4, 'Exception')
;
select
array_agg(grouped.id) as record_ids,
grouped.user_id,
array_agg(grouped.message_id) as message_ids,
grouped.message_type
from (
select changed.*,
sum(changed.changed) over (order by changed.id desc) as group_n
from (
select tt.*,
case when lag((user_id, message_type)) over (order by tt.id desc) is distinct from (user_id, message_type) then 1 else 0 end as changed
from test tt
) changed
order by id desc
) grouped
group by grouped.group_n, grouped.user_id, grouped.message_type
order by grouped.group_n
;
结果:
record_ids | user_id | message_ids | message_type
------------+---------+-------------+--------------
{8} | 2 | {4} | Exception
{7,6} | 1 | {3,2} | Exception
{5} | 2 | {1} | Exception
{4} | 2 | {4} | Error
{3,2,1} | 1 | {6,5,4} | Warning
(5 rows)