Postgres 限制来自另一个 table 的 WHERE IN id 中每个行的行数

Postgres limit number of rows for each in WHERE IN id from another table

我有一个消息传递应用程序,我需要在其中 return 用户参与的所有对话以及与每个对话相关的消息。我想限制每次对话的消息数量。

Table结构如下:

用户

| id   | name | email    | created_at |
|------|------|----------|------------|
| 1    | Bob  | a@b.com  | timestamp  |
| 2    | Tom  | b@b.com  | timestamp  |
| 3    | Mary | c@b.com  | timestamp  |

消息

| id   | sender_id | conversation_id  | message | created_at |
|------|-----------|------------------|---------|------------|
| 1    | 1         | 1                | text    | timestamp  |
| 2    | 2         | 2                | text    | timestamp  |
| 3    | 2         | 1                | text    | timestamp  |
| 4    | 3         | 3                | text    | timestamp  |

对话

| id | created_at |
|----|------------|
| 1  | timestamp  |
| 2  | timestamp  |
| 3  | timestamp  |

Conversations_Users

| id | user_id | conversation_id |
|----|---------|-----------------|
| 1  | 1       | 1               |
| 2  | 2       | 1               |
| 3  | 2       | 2               |
| 3  | 3       | 2               |
| 4  | 3       | 3               |
| 5  | 1       | 3               |

我想加载用户 (id 1) 所在的所有对话(在示例中 - 对话 1 和 3)。对于每个对话,我需要与其关联的消息,按 conversation_id 分组,按 created_at ASC 排序。我当前的查询处理这个:

SELECT
    *
FROM
    messages
WHERE
    conversation_id IN (
        SELECT
            conversation_id
        FROM
            conversations_users
        WHERE
            user_id = 1
    )
ORDER BY
    conversation_id, created_at ASC;

但是,这会将大量数据存入内存。因此,我想限制每次对话的消息数量。

我看过 rank()ROW_NUMBER() 但不确定如何实施 them/if 它们正是我们所需要的。

这是一个使用 row_number() 限制每个 100 users 对话的示例。在 descending 中获取最新的 conversations.

select * from 
messages t1
inner join(
    select row_number() over (partition by user_id order by conversation_id desc) rn, conversation_id, user_id
    from conversations_users) t2 on t1.user_id = t2.user_id
where rn <= 100
order by created_at asc;

你确实可以使用row_number()。以下查询将为您提供给定用户每次对话的最后 10 条消息:

select *
from (
    select 
        m.*, 
        row_number() over(
            partition by cu.user_id, m.conversation_id 
            order by m.created_at desc
        ) rn
    from messages m
    inner join conversations_users cu 
        on  cu.conversation_id  = m.conversation_id 
        and cu.user_id = 1
) t
where rn <= 10
order by conversation_id, created_at desc

备注:

  • 我将带有 in 的子查询转换为常规 join,因为我认为这是表达您的要求的更简洁的方式

  • 我在分区子句中加入了用户id;因此,如果您删除过滤用户的 where 子句,您将获得每个用户对话的最后 10 条消息

您可以使用 ROW_NUMBER() 来限制每次对话的消息数。获取最新的:

SELECT m.*
FROM (SELECT m.*,
             ROW_NUMBER() OVER (PARTITION BY m.conversation_id ORDER BY m.created_at DESC) as seqnum
      FROM messages m
     ) m JOIN
     conversation_users cu
     ON m.conversation_id = cu.conversation_id
WHERE cu.user_id = 1 AND seqnum <= <n>
ORDER BY m.conversation_id, m.created_at ASC;

另一种方法是使用横向连接:

select m.*
from conversation_users cu cross join lateral
     (select m.*
      from messages m
      where m.conversation_id = cu.conversation_id
      order by m.created_at desc
      limit <n>
     ) m
where cu.user_id = 1
order by m.message_id, m.created_at;

我认为这可能对更大的数据有更好的性能,但你需要测试一下。