在不同列上使用 GROUP BY、COUNT(DISTINCT) 和 SUM 进行快速 PostgreSQL 查询

Question

我正在尝试查询一个 table 大约有 150 万条记录。我有索引，而且性能很好。

但是，其中一列我想获得一个不同列（有很多重复项）的计数。当我执行 DISTINCT 与不执行时，它的速度要慢 10 倍。

这是查询：

SELECT
    created_at,
    SUM(amount) as total,
    COUNT(DISTINCT partner_id) as count_partners
FROM 
    consumption
WHERE
    is_official = true
    AND
    (is_processed = true OR is_deferred = true)
GROUP BY created_at

这需要 2.5 秒

如果我成功了：

COUNT(partner_id) as count_partners

需要230毫秒。但这不是我想要的。

我想要每个分组（日期）的唯一一组合作伙伴以及他们在该期间消费的金额总和。

我不明白为什么这么慢。 PostgreSQL 似乎非常快速地创建了一个包含所有重复项的数组，为什么简单地向它添加 DISTINCT 会破坏它的性能？

查询计划:

GroupAggregate  (cost=85780.70..91461.63 rows=12 width=24) (actual time=1019.428..2641.434 rows=13 loops=1)
  Output: created_at, sum(amount), count(DISTINCT partner_id)" 
Group Key: p.created_at
  Buffers: shared hit=16487
  ->  Sort  (cost=85780.70..87200.90 rows=568081 width=16) (actual time=865.599..945.674 rows=568318 loops=1)
        Output: created_at, amount, partner_id
        Sort Key: p.created_at
        Sort Method: quicksort  Memory: 62799kB
        Buffers: shared hit=16487
        ->  Seq Scan on public.consumption p  (cost=0.00..31484.26 rows=568081 width=16) (actual time=0.020..272.126 rows=568318 loops=1)
              Output: created_at, amount, partner_id
              Filter: (p.is_official AND (p.is_deferred OR p.is_processed))
              Rows Removed by Filter: 931408
              Buffers: shared hit=16487
Planning Time: 0.191 ms
Execution Time: 2647.629 ms

索引:

CREATE INDEX IF NOT EXISTS i_pid ON consumption (partner_id);
CREATE INDEX IF NOT EXISTS i_processed ON consumption (is_processed);
CREATE INDEX IF NOT EXISTS i_official ON consumption (is_official);
CREATE INDEX IF NOT EXISTS i_deferred ON consumption (is_deferred);
CREATE INDEX IF NOT EXISTS i_created ON consumption (created_at);

Answer 1

以下查询应该能够从索引中受益。

SELECT  
  created_at,     
  SUM(amount) AS total,     
  COUNT(DISTINCT partner_id) AS count_partners 
FROM      
  (SELECT 
    created_at,         
    sum(amount) as amount,         
    partner_id   
  FROM consumption   
  WHERE is_official = true     
    AND (is_processed = true OR is_deferred = true)   
  GROUP BY 
    created_at,
    partner_id   
  ) AS c  
GROUP BY created_at;

在不同列上使用 GROUP BY、COUNT(DISTINCT) 和 SUM 进行快速 PostgreSQL 查询

Fast PostgreSQL query with GROUP BY, COUNT(DISTINCT) and SUM on differrent columns

postgresql