使用 postgreSQL 的 ORDER BY COUNT 优化（索引？）

Question

这是一个简单的查询，用于获取按 participations 计数排序的 projects 列表。

SELECT projects.*, COUNT(participations) as ct from projects
  INNER JOIN participations ON participations.project_id=projects.id
  GROUP BY projects.id
ORDER BY ct DESC
LIMIT 5;

我并不惊讶它这么慢（2 秒）。一个没有按 participations 计数排序的简单查询，速度超快（正常，~10ms）。

有 ~2k projects 和 ~430k participations。

我应该怎么做才能加快这个请求？我觉得我必须添加一个索引，但我不知道哪个（或哪些？）以及如何。

编辑：

Table paticipations:

create table participations
(
    id serial not null
        constraint participations_pkey
            primary key,
    project_id integer not null
        constraint participations_project_id_foreign
            references projects
        constraint participations_service_id_foreign
            references projects,
    status varchar(32) default 'unknown'::character varying not null,
    data jsonb,
    created_at timestamp(0),
    updated_at timestamp(0),
    deleted_at timestamp(0)
);

create index participations_status_index on participations (status);
create index participation_data_idx on participations (data);

Table projects:

create table projects
(
    id serial not null
        constraint services_pkey
            primary key,
    user_id integer not null
        constraint services_user_id_foreign
            references users,
    type varchar(50) not null,
    name varchar(255) not null,
    url varchar(255) not null,
    start_at timestamp(0),
    end_at timestamp(0),
    visibility varchar(255) not null,
    status varchar(255) not null,
    metadata jsonb not null,
    data jsonb not null,
    created_at timestamp(0),
    updated_at timestamp(0),
    deleted_at timestamp(0)
);

create index services_type_index on projects (type);
create index services_name_index on projects (name);
create index services_url_index on projects (url);
create index services_start_at_index on projects (start_at);
create index services_end_at_index on projects (end_at);
create index services_visibility_index on projects (visibility);
create index services_status_index on projects (status);

查询计划：

Limit  (cost=487359.41..487359.43 rows=5 width=2218) (actual time=2177.264..2177.265 rows=5 loops=1)
Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
->  Sort  (cost=487359.41..487366.68 rows=2907 width=2218) (actual time=2177.262..2177.262 rows=5 loops=1)
        Sort Key: (count(participations.*)) DESC
        Sort Method: top-N heapsort  Memory: 33kB
        Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
        ->  GroupAggregate  (cost=477474.47..487311.13 rows=2907 width=2218) (actual time=1170.136..2175.445 rows=1884 loops=1)
            Group Key: projects.id
            Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
            ->  Merge Join  (cost=477474.47..485105.80 rows=435252 width=2218) (actual time=1170.122..2120.557 rows=435207 loops=1)
                    Merge Cond: (projects.id = participations.project_id)
                    Buffers: shared hit=37446 read=3493, temp read=21051 written=39402
                    ->  Sort  (cost=625.30..632.57 rows=2907 width=1131) (actual time=22.443..27.047 rows=2879 loops=1)
                        Sort Key: projects.id
                        Sort Method: external sort  Disk: 3216kB
                        Buffers: shared hit=429, temp read=402 written=402
                        ->  Seq Scan on projects  (cost=0.00..458.07 rows=2907 width=1131) (actual time=0.011..1.532 rows=2879 loops=1)
                                Buffers: shared hit=429
                    ->  Materialize  (cost=476849.12..479025.38 rows=435252 width=1091) (actual time=1147.671..1996.166 rows=435207 loops=1)
                        Buffers: shared hit=37017 read=3493, temp read=20649 written=39000
                        ->  Sort  (cost=476849.12..477937.25 rows=435252 width=1091) (actual time=1147.666..1400.856 rows=435207 loops=1)
                                Sort Key: participations.project_id
                                Sort Method: external merge  Disk: 165040kB
                                Buffers: shared hit=37017 read=3493, temp read=20649 written=20649
                                ->  Seq Scan on participations  (cost=0.00..19528.52 rows=435252 width=1091) (actual time=0.012..318.347 rows=435207 loops=1)
                                    Buffers: shared hit=37017 read=3493
Planning time: 1.309 ms
Execution time: 2203.249 ms

Answer 1

您可以尝试先聚合，然后加入聚合：

SELECT projects.*, pt.ct 
from projects 
  join (
    select project_id, count(*) as ct 
    from participations 
    GROUP BY project_id
  ) pt ON pt.project_id = projects.id
ORDER BY ct DESC
LIMIT 5;

另外：您所有的排序都是在磁盘上完成的。如果你增加 work_mem 这也应该提高性能。

Answer 2

order by不能使用索引，因为它基于计算值。

我可能建议在子查询中执行 order by 和 limit：

SELECT pr.*, p.ct 
FROM projects pr JOIN
     (SELECT p.project_id, COUNT(*) as ct
      FROM participations p
      GROUP BY p.project_id
      ORDER BY ct DESC
      LIMIT 5
     ) p
     ON p.project_id = pr.id;

使用 postgreSQL 的 ORDER BY COUNT 优化（索引？）

Optimization (indexing?) for ORDER BY COUNT with postgreSQL

sql

postgresql

indexing

query-performance

postgresql-9.5