限制每组结果

Limit results of each group

我想限制每个组中的记录,这样当我在select语句中将它们聚合成一个JSON对象时,它只需要N个conversations最高 count

有什么想法吗?

我的查询:

select
          dt.id as app_id,
          json_build_object(
              'rows', array_agg(
                 json_build_object(
                    'url', dt.started_at_url,
                    'count', dt.count
                 )
              )
          ) as data
      from (
          select a.id, c.started_at_url, count(c.id)
          from apps a
          left join conversations c on c.app_id = a.id
          where started_at_url is not null and c.started_at::date > (current_date - (7  || ' days')::interval)::date
          group by a.id, c.started_at_url
          order by count desc
      ) as dt
      where dt.id = 'ASnYW1-RgCl0I'
      group by dt.id

你的问题类似于groupwise-max问题,有很多解决方法。

过滤row_numberwindow函数

一个简单的方法是使用 row_number() window 函数并仅过滤掉结果 < N 的行(以 5 为例):

select
          dt.id as app_id,
          json_build_object(
              'rows', array_agg(
                 json_build_object(
                    'url', dt.started_at_url,
                    'count', dt.count
                 )
              )
          ) as data
      from (
          select
              a.id, c.started_at_url,
              count(c.id) as count,
              row_number() over(partition by a.id order by count(c.id) desc) as rn
          from apps a
          left join conversations c on c.app_id = a.id
          where started_at_url is not null and c.started_at > (current_date - (7  || ' days')::interval)::date
          group by a.id, c.started_at_url
          order by count desc
      ) as dt
      where
          dt.id = 'ASnYW1-RgCl0I'
          and dt.rn <= 5 /* get top 5 only */
      group by dt.id

使用横向

另一种选择是使用 LATERALLIMIT 只返回您感兴趣的结果:

select
    a.id as app_id,
    json_build_object(
        'rows', array_agg(
           json_build_object(
              'url', dt.started_at_url,
              'count', dt.count
           )
        )
    ) as data
form
    apps a, lateral(
        select
            c.started_at_url,
            count(*) as count
        from
            conversations c
        where
            c.app_id = a.id /* here is why lateral is necessary */
            and c.started_at_url is not null
            and c.started_at > (current_date - (7  || ' days')::interval)::date
        group by
            c.started_at_url
        order by
            count(*) desc
        limit 5 /* get top 5 only */
    ) as dt
where
    a.id = 'ASnYW1-RgCl0I'
group by
    a.id

OBS:我没试过,所以可能有错别字。如果您希望进行一些测试,可以提供示例数据集。

OBS 2: 如果您真的在最终查询中按 app_id 进行过滤,那么您甚至不需要那个 GROUP BY 子句。