如何优化大型 psql 查询

How to optimize large psql query

我正在寻找 suggestions/direction 如何改进这个大型查询。

当我 explain/analyze 它时,我看到了一些弱点,例如过大的估计,以及对连接的顺序扫描缓慢。

然而,在检查了一些索引并深入研究之后,我仍然不知道如何改进它:

查询:

WITH my_activities AS (
            WITH people_relations AS (
                    SELECT people.id AS people_relations_id, array_agg(DISTINCT type) AS person_relations, companies.id AS company_id, companies.name AS company_name, companies.platform_url AS company_platform_url FROM people
                    INNER JOIN relationships AS person_relation ON platform_user_id = 6 AND person_relation.person_id = people.id AND person_relation.type != 'Suppressee'
                    LEFT OUTER JOIN companies ON people.company_id = companies.id
                    GROUP BY people.id, people.title, companies.id, companies.name, companies.platform_url)
            SELECT owner_person.id, 
                            owner_person.full_name,
                            owner_person.first_name,
                            owner_person.last_name,
                            owner_person.title,
                            owner_person.headshot AS owner_headshot,
                            owner_person.public_identifier AS owner_public_identifier,
                            owner_relations.person_relations AS owner_relationships,
                            owner_relations.company_id AS owner_company_id,
                            owner_relations.company_name AS owner_company_name,
                            owner_relations.company_platform_url AS owner_company_platform_url,
                            recipient_relations.person_relations AS recipient_relationships,
                            activities.id AS activity_id,
                            activities.key AS activity_key,
                            recipient.id AS recipient_id,
                            recipient.full_name AS recipient_full_name,
                            recipient.title AS recipient_title,
                            recipient.headshot AS recipient_headshot,
                            recipient.public_identifier AS recipient_public_identifier,
                            recipient_relations.company_name AS recipient_company_name,
                            recipient_relations.company_platform_url AS recipient_company_platform_url,
                            recipient_person.type AS recipient_relation,
                            coalesce(t_posts.id, t_post_likes.id, t_post_comments.id) AS trackable_id,
                            trackable_type,
                            coalesce(t_posts.post_date, t_post_comments.created_time, t_post_likes_post.post_date, activities.occurred_at) AS trackable_date,
                            coalesce(t_posts.permalink, t_post_comments.permalink, t_post_likes_post.permalink) AS trackable_permalink,
                            coalesce(t_posts.content, t_post_comments_post.content, t_post_likes_post.content) AS trackable_content,
                            trackable_companies.name AS trackable_company_name,
                            trackable_companies.platform_url AS trackable_company_platform_url,
                            t_post_comments.comment as trackable_comment FROM people AS owner_person

            INNER JOIN activities ON activities.owner_id = owner_person.id AND activities.owner_type = 'Person'
                                    AND ((activities.key = 'job.changed' AND activities.occurred_at > '2022-01-31 15:09:54') OR
                                                (activities.key != 'job.changed' AND activities.occurred_at > '2022-04-24 14:09:54'))
            LEFT OUTER JOIN li_user_activities ON activities.id = li_user_activities.activity_id AND li_user_activities.platform_user_id = 6
                                                AND li_user_activities.dismissed_at IS NULL 
            LEFT OUTER JOIN icp_ids ON owner_person.id = icp_ids.icp_id
            LEFT OUTER JOIN companies as trackable_companies ON trackable_companies.id = activities.trackable_id AND activities.trackable_type = 'Company'
            LEFT OUTER JOIN posts as t_posts ON activities.trackable_id = t_posts.id AND activities.trackable_type = 'Post'
            LEFT OUTER JOIN post_likes as t_post_likes ON activities.trackable_id = t_post_likes.id AND activities.trackable_type = 'PostLike'
            LEFT OUTER JOIN posts as t_post_likes_post ON t_post_likes.post_id = t_post_likes_post.id
            LEFT OUTER JOIN post_comments as t_post_comments ON activities.trackable_id = t_post_comments.id AND activities.trackable_type = 'PostComment'
            LEFT OUTER JOIN posts as t_post_comments_post ON t_post_comments.post_id = t_post_comments_post.id
            LEFT OUTER JOIN people AS recipient ON recipient.id = activities.recipient_id
            LEFT OUTER JOIN relationships AS recipient_person ON recipient_person.person_id = recipient.id
            INNER JOIN people_relations AS owner_relations ON owner_relations.people_relations_id = owner_person.id
            LEFT OUTER JOIN people_relations AS recipient_relations ON recipient_relations.people_relations_id = recipient.id
            WHERE ((recipient.id IS NULL OR recipient.id != owner_person.id) ) AND (key != 'asdasd'))
                SELECT owner_relationships AS owner_relationships,
                                json_agg(DISTINCT recipient_relationships) AS recipient_relationships,
                                id,
                                jsonb_build_object('id', id, 'first_name', first_name, 'last_name', last_name, 'full_name', full_name, 'title', title, 'headshot', owner_headshot, 'public_identifier', owner_public_identifier, 'profile_url', ('https://' || owner_public_identifier), 'company', jsonb_build_object( 'id', owner_company_id, 'name', owner_company_name, 'platform_url', owner_company_platform_url )) AS owner,
                                json_agg( DISTINCT jsonb_build_object('id', activity_id,
                                                                                        'key', activity_key,
                                                                                        'recipient', jsonb_build_object('id', recipient_id, 'full_name', recipient_full_name, 'title', recipient_title, 'headshot', recipient_headshot, 'public_identifier', recipient_public_identifier, 'profile_url', ('https://' || recipient_public_identifier), 'relation', recipient_relationships, 'company', jsonb_build_object('name', recipient_company_name, 'platform_url', recipient_company_platform_url)),
                                                                                        'trackable', jsonb_build_object('id', trackable_id, 'type', trackable_type, 'comment', trackable_comment, 'permalink', trackable_permalink, 'date', trackable_date, 'content', trackable_content, 'company_name', trackable_company_name, 'company_platform_url', trackable_company_platform_url)
                                                                                        )) AS data
                FROM my_activities
                GROUP BY id, first_name, last_name, full_name, title, owner_headshot, owner_public_identifier, owner_relationships, owner_company_id, owner_company_name, owner_company_platform_url

解释(也见于此处:https://explain.dalibo.com/plan/3pJg):

    GroupAggregate  (cost=654190.74..655692.10 rows=21448 width=298) (actual time=3170.209..3267.033 rows=327 loops=1)
  Group Key: my_activities.id, my_activities.first_name, my_activities.last_name, my_activities.full_name, my_activities.title, my_activities.owner_headshot, my_activities.owner_public_identifier, my_activities.owner_relationships, my_activities.owner_company_id, my_activities.owner_company_name, my_activities.owner_company_platform_url
  ->  Sort  (cost=654190.74..654244.36 rows=21448 width=674) (actual time=3168.944..3219.547 rows=2733 loops=1)
        Sort Key: my_activities.id, my_activities.first_name, my_activities.last_name, my_activities.full_name, my_activities.title, my_activities.owner_headshot, my_activities.owner_public_identifier, my_activities.owner_relationships, my_activities.owner_company_id, my_activities.owner_company_name, my_activities.owner_company_platform_url
        Sort Method: external merge  Disk: 3176kB
        ->  Subquery Scan on my_activities  (cost=638222.87..646193.71 rows=21448 width=674) (actual time=3142.221..3210.966 rows=2733 loops=1)
              ->  Hash Right Join  (cost=638222.87..645979.23 rows=21448 width=706) (actual time=3142.219..3210.753 rows=2733 loops=1)
                    Hash Cond: (recipient_relations.people_relations_id = recipient.id)
                    CTE people_relations
                      ->  GroupAggregate  (cost=142850.94..143623.66 rows=34343 width=152) (actual time=1556.908..1593.594 rows=33730 loops=1)
                            Group Key: people.id, companies.id
                            ->  Sort  (cost=142850.94..142936.80 rows=34343 width=129) (actual time=1556.875..1560.123 rows=33780 loops=1)
                                  Sort Key: people.id, companies.id
                                  Sort Method: external merge  Disk: 3816kB
                                  ->  Gather  (cost=1647.48..137915.08 rows=34343 width=129) (actual time=1405.433..1537.693 rows=33780 loops=1)
                                        Workers Planned: 2
                                        Workers Launched: 2
                                        ->  Nested Loop Left Join  (cost=647.48..133480.78 rows=14310 width=129) (actual time=570.743..710.682 rows=11260 loops=3)
                                              ->  Nested Loop  (cost=647.05..104036.25 rows=14310 width=55) (actual time=570.719..655.804 rows=11260 loops=3)
                                                    ->  Parallel Bitmap Heap Scan on relationships person_relation  (cost=646.62..13074.28 rows=14310 width=13) (actual time=570.627..579.277 rows=11260 loops=3)
                                                          Recheck Cond: (platform_user_id = 6)
                                                          Filter: ((type)::text <> 'Suppressee'::text)
                                                          Rows Removed by Filter: 12
                                                          Heap Blocks: exact=1642
                                                          ->  Bitmap Index Scan on index_relationships_on_platform_user_id_and_person_id  (cost=0.00..638.03 rows=34347 width=0) (actual time=2.254..2.254 rows=33829 loops=1)
                                                                Index Cond: (platform_user_id = 6)
                                                    ->  Index Scan using people_pkey on people  (cost=0.43..6.36 rows=1 width=46) (actual time=0.006..0.006 rows=1 loops=33780)
                                                          Index Cond: (id = person_relation.person_id)
                                              ->  Index Scan using companies_pkey on companies  (cost=0.43..2.06 rows=1 width=82) (actual time=0.005..0.005 rows=1 loops=33780)
                                                    Index Cond: (id = people.company_id)
                    ->  CTE Scan on people_relations recipient_relations  (cost=0.00..686.86 rows=34343 width=104) (actual time=0.018..4.247 rows=33730 loops=1)
                    ->  Hash  (cost=488466.12..488466.12 rows=21448 width=2209) (actual time=3142.015..3191.555 rows=2733 loops=1)
                          Buckets: 2048  Batches: 16  Memory Usage: 655kB
                          ->  Merge Join  (cost=487925.89..488466.12 rows=21448 width=2209) (actual time=3094.438..3187.748 rows=2733 loops=1)
                                Merge Cond: (owner_relations.people_relations_id = activities.owner_id)
                                ->  Sort  (cost=5272.71..5358.57 rows=34343 width=112) (actual time=1622.739..1626.249 rows=33730 loops=1)
                                      Sort Key: owner_relations.people_relations_id
                                      Sort Method: external merge  Disk: 4128kB
                                      ->  CTE Scan on people_relations owner_relations  (cost=0.00..686.86 rows=34343 width=112) (actual time=1556.912..1610.745 rows=33730 loops=1)
                                ->  Materialize  (cost=482653.17..482746.77 rows=18719 width=2113) (actual time=1471.676..1552.408 rows=69702 loops=1)
                                      ->  Sort  (cost=482653.17..482699.97 rows=18719 width=2113) (actual time=1471.672..1543.930 rows=69702 loops=1)
                                            Sort Key: owner_person.id
                                            Sort Method: external merge  Disk: 84608kB
                                            ->  Gather  (cost=64235.86..464174.85 rows=18719 width=2113) (actual time=1305.158..1393.927 rows=81045 loops=1)
                                                  Workers Planned: 2
                                                  Workers Launched: 2
                                                  ->  Parallel Hash Left Join  (cost=63235.86..461302.95 rows=7800 width=2113) (actual time=1289.165..1311.400 rows=27015 loops=3)
                                                        Hash Cond: (t_post_comments.post_id = t_post_comments_post.id)
                                                        ->  Nested Loop Left Join  (cost=51190.69..443455.30 rows=7800 width=1700) (actual time=443.623..511.046 rows=27015 loops=3)
                                                              Join Filter: ((activities.trackable_type)::text = 'PostComment'::text)
                                                              Rows Removed by Join Filter: 1756
                                                              ->  Parallel Hash Left Join  (cost=51190.26..395642.27 rows=7800 width=1408) (actual time=443.580..471.580 rows=27015 loops=3)
                                                                    Hash Cond: (recipient.id = recipient_person.person_id)
                                                                    ->  Nested Loop Left Join  (cost=26667.49..366532.83 rows=7800 width=1408) (actual time=214.602..348.548 rows=6432 loops=3)
                                                                          Filter: ((recipient.id IS NULL) OR (recipient.id <> owner_person.id))
                                                                          Rows Removed by Filter: 249
                                                                          ->  Nested Loop Left Join  (cost=26667.06..310170.84 rows=7800 width=1333) (actual time=214.591..338.396 rows=6681 loops=3)
                                                                                Join Filter: ((activities.trackable_type)::text = 'Company'::text)
                                                                                Rows Removed by Join Filter: 894
                                                                                ->  Hash Left Join  (cost=26666.63..257110.20 rows=7800 width=1259) (actual time=214.566..324.738 rows=6681 loops=3)
                                                                                      Hash Cond: (activities.id = li_user_activities.activity_id)
                                                                                      ->  Nested Loop  (cost=25401.21..255737.89 rows=7800 width=1259) (actual time=208.406..315.896 rows=6681 loops=3)
                                                                                            ->  Parallel Hash Left Join  (cost=25400.78..199473.40 rows=7800 width=1161) (actual time=208.367..216.663 rows=6681 loops=3)
                                                                                                  Hash Cond: (t_post_likes.post_id = t_post_likes_post.id)
                                                                                                  ->  Nested Loop Left Join  (cost=12700.61..182373.75 rows=7800 width=623) (actual time=143.176..167.675 rows=6681 loops=3)
                                                                                                        Join Filter: ((activities.trackable_type)::text = 'PostLike'::text)
                                                                                                        Rows Removed by Join Filter: 1095
                                                                                                        ->  Parallel Hash Left Join  (cost=12700.17..131647.07 rows=7800 width=611) (actual time=143.146..156.428 rows=6681 loops=3)
                                                                                                              Hash Cond: (activities.trackable_id = t_posts.id)
                                                                                                              Join Filter: ((activities.trackable_type)::text = 'Post'::text)
                                                                                                              Rows Removed by Join Filter: 1452
                                                                                                              ->  Parallel Seq Scan on activities  (cost=0.00..115613.42 rows=7800 width=61) (actual time=0.376..80.040 rows=6681 loops=3)
                                                                                                                    Filter: (((key)::text <> 'asdasd'::text) AND ((owner_type)::text = 'Person'::text) AND ((((key)::text = 'job.changed'::text) AND (occurred_at > '2022-01-31 15:09:54'::timestamp without time zone)) OR (((key)::text <> 'job.changed'::text) AND (occurred_at > '2022-04-24 14:09:54'::timestamp without time zone))))
                                                                                                                    Rows Removed by Filter: 27551
                                                                                                              ->  Parallel Hash  (cost=8996.19..8996.19 rows=44719 width=550) (actual time=57.638..57.639 rows=35776 loops=3)
                                                                                                                    Buckets: 8192  Batches: 16  Memory Usage: 4032kB
                                                                                                                    ->  Parallel Seq Scan on posts t_posts  (cost=0.00..8996.19 rows=44719 width=550) (actual time=0.032..14.451 rows=35776 loops=3)
                                                                                                        ->  Index Scan using post_likes_pkey on post_likes t_post_likes  (cost=0.43..6.49 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=20042)
                                                                                                              Index Cond: (id = activities.trackable_id)
                                                                                                  ->  Parallel Hash  (cost=8996.19..8996.19 rows=44719 width=550) (actual time=35.322..35.322 rows=35776 loops=3)
                                                                                                        Buckets: 8192  Batches: 16  Memory Usage: 4000kB
                                                                                                        ->  Parallel Seq Scan on posts t_post_likes_post  (cost=0.00..8996.19 rows=44719 width=550) (actual time=0.022..10.427 rows=35776 loops=3)
                                                                                            ->  Index Scan using people_pkey on people owner_person  (cost=0.43..7.21 rows=1 width=98) (actual time=0.014..0.014 rows=1 loops=20042)
                                                                                                  Index Cond: (id = activities.owner_id)
                                                                                      ->  Hash  (cost=951.58..951.58 rows=25107 width=4) (actual time=6.115..6.116 rows=25698 loops=3)
                                                                                            Buckets: 32768  Batches: 1  Memory Usage: 1160kB
                                                                                            ->  Seq Scan on li_user_activities  (cost=0.00..951.58 rows=25107 width=4) (actual time=0.011..3.578 rows=25698 loops=3)
                                                                                                  Filter: ((dismissed_at IS NULL) AND (platform_user_id = 6))
                                                                                                  Rows Removed by Filter: 15722
                                                                                ->  Index Scan using companies_pkey on companies trackable_companies  (cost=0.43..6.79 rows=1 width=82) (actual time=0.002..0.002 rows=0 loops=20042)
                                                                                      Index Cond: (id = activities.trackable_id)
                                                                          ->  Index Scan using people_pkey on people recipient  (cost=0.43..7.21 rows=1 width=83) (actual time=0.001..0.001 rows=1 loops=20042)
                                                                                Index Cond: (id = activities.recipient_id)
                                                                    ->  Parallel Hash  (cost=16874.67..16874.67 rows=466168 width=4) (actual time=79.735..79.736 rows=372930 loops=3)
                                                                          Buckets: 131072  Batches: 16  Memory Usage: 3840kB
                                                                          ->  Parallel Seq Scan on relationships recipient_person  (cost=0.00..16874.67 rows=466168 width=4) (actual time=0.021..35.805 rows=372930 loops=3)
                                                              ->  Index Scan using post_comments_pkey on post_comments t_post_comments  (cost=0.42..6.12 rows=1 width=300) (actual time=0.001..0.001 rows=0 loops=81045)
                                                                    Index Cond: (id = activities.trackable_id)
                                                        ->  Parallel Hash  (cost=8996.19..8996.19 rows=44719 width=425) (actual time=726.076..726.076 rows=35776 loops=3)
                                                              Buckets: 16384  Batches: 16  Memory Usage: 3264kB
                                                              ->  Parallel Seq Scan on posts t_post_comments_post  (cost=0.00..8996.19 rows=44719 width=425) (actual time=479.054..488.703 rows=35776 loops=3)
Planning Time: 5.286 ms
JIT:
  Functions: 304
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 22.990 ms, Inlining 260.865 ms, Optimization 1652.601 ms, Emission 1228.811 ms, Total 3165.267 ms
Execution Time: 3303.637 ms

更新: 这是 jit=off 的计划:

https://explain.dalibo.com/plan/EXn

看起来你基本上所有的时间都在做 just-in-time 编译。关闭 JIT(在配置文件中 jit=off,或在会话中 set jit=off;。)

我认为关闭 JIT 会使它下降更多,因为最初将 3303.637 - 3165.267 = 138 毫秒以外的所有时间都归因于 JIT。您应该在 JIT 打开和关闭之间交替几次,以查看您最初报告的时间是否可重现,或者可能只是缓存效果的差异。

另外,你报告的时间比计划本身报告的时间长2-3倍。这是您应该检查的另一件事,以了解它的可重现性。也许大部分时间都花在格式化要发送的数据上,或者花在通过网络发送数据上。 (这似乎不太可能只有 240 行,但我不知道还有什么可以解释它。)

现在花费的时间在整个计划中分布得很薄,所以没有任何可以对任何节点进行的更改会对整体时间产生重大影响。而且我没有看到估计错误正在推动任何计划选择,而更好的估计会导致更好的选择。

鉴于缺乏明确的瓶颈,加速的机会可能是更快的驱动器或更多的 RAM 用于缓存或增加 max_parallel_workers_per_gather 以便您可以并行完成更多工作。

看问题的文字,我不明白它的动机是什么,所以限制了我提出建议的能力。但是那里有很多不同的东西。是否有一些连接生成了不必要的重复行,然后用 DISTINCT 将其压缩回来?如果是这样,也许使用 WHERE exists (...) 可以改善情况。