Postgre 通过重复属性进行双重分组

Postgree double group by repeating attribute

我有 table 一些列:id、user_id、message_id、message_type;例如:

  id: 1, user_id: 1, message_id: 4, message_type: 'Warning'
  id: 2, user_id: 1, message_id: 5, message_type: 'Warning'
  id: 3, user_id: 1, message_id: 6, message_type: 'Warning'
  id: 4, user_id: 2, message_id: 4, message_type: 'Error'
  id: 5, user_id: 2, message_id: 1, message_type: 'Exception'
  id: 6, user_id: 1, message_id: 2, message_type: 'Exception'
  id: 7, user_id: 1, message_id: 3, message_type: 'Exception'
  id: 8, user_id: 2, message_id: 4, message_type: 'Exception'

我想在社交网络中获得像新闻这样的分组结果。在 user_id 和 message_type 列上重复 message_type。并且需要 LIMIT 20 ORDER BY id DESC。 示例:

  id: 8, user_id: 2, message_id: 4, message_type: 'Exception'
  id: {6,7} user_id: 1, message_id: {2,3}, message_type: 'Exception'
  id: 5, user_id: 2, message_id: 1, message_type: 'Exception'
  id: 4, user_id: 2, message_id: 4, message_type: 'Error'
  id: {1, 2, 3}, user_id: 1, message_id: {4, 5, 6}, message_type: 'Warning'

如何以最佳性能做到这一点?

array_agg 函数应该可以解决问题:

SELECT   user_id, 
         message_type, 
         ARRAY_AGG (DISTINCT id), 
         ARRAY_AGG (DISTINCT message_id)
FROM     mytable
GROUP BY user_id, message_type

我只找到了一种方式:

  1. 用window函数lead()查找字典被修改的时刻(user, message type)
  2. 使用window函数sum()为每个新字典设置序号
  3. 按顺序分组,select你需要什么:

正在检查:

create table test (
  id serial primary key,
  user_id integer,
  message_id integer,
  message_type varchar
);

insert into test (user_id, message_id, message_type) 
values 
(1, 4, 'Warning'),
(1, 5, 'Warning'),
(1, 6, 'Warning'),
(2, 4, 'Error'),
(2, 1, 'Exception'),
(1, 2, 'Exception'),
(1, 3, 'Exception'),
(2, 4, 'Exception')
;

select 
    array_agg(grouped.id) as record_ids,
    grouped.user_id,
    array_agg(grouped.message_id) as message_ids,
    grouped.message_type
from (
    select changed.*, 
        sum(changed.changed) over (order by changed.id desc) as group_n
    from (
        select tt.*,
            case when lag((user_id, message_type)) over (order by tt.id desc) is distinct from (user_id, message_type) then 1 else 0 end as changed
        from test tt
    ) changed
    order by id desc
) grouped
group by grouped.group_n, grouped.user_id, grouped.message_type
order by grouped.group_n
;

结果:

record_ids | user_id | message_ids | message_type 
------------+---------+-------------+--------------
 {8}        |       2 | {4}         | Exception
 {7,6}      |       1 | {3,2}       | Exception
 {5}        |       2 | {1}         | Exception
 {4}        |       2 | {4}         | Error
 {3,2,1}    |       1 | {6,5,4}     | Warning
(5 rows)