为什么这个查询不使用我创建的索引?
Why doesn't this query use the indexes i've created?
运行 postgres 9.5
SELECT events.id, events.start_time, events.host_id, events.title from events
JOIN accountsevents ON accountsevents.events_id = events.id
WHERE accountsevents.accounts_id = %(account_id)s OR events.host_id = %(account_id)s
GROUP BY events.id
ORDER BY start_time DESC
我有这个查询,postgres 说费用超过 100k。好像过分了这是我唯一没有使用我为每个 table.
创建的索引的查询
Indexes:
"events_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"events_host_id_fkey" FOREIGN KEY (host_id) REFERENCES accounts(id)
Referenced by:
TABLE "accountsevents" CONSTRAINT "accountsevents_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
TABLE "eventsinterests" CONSTRAINT "eventsinterests_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
Indexes:
"accountsevents_pkey" PRIMARY KEY, btree (id, accounts_id, events_id)
Foreign-key constraints:
"accountsevents_accounts_id_fkey" FOREIGN KEY (accounts_id) REFERENCES accounts(id)
"accountsevents_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
我觉得索引设置有误,或者我只是在查询中遗漏了一些内容。初始序列扫描正在杀死它。
添加了详细解释
"Sort (cost=124388.27..124390.10 rows=732 width=40) (actual time=1533.902..1533.928 rows=470 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Sort Key: events.start_time DESC"
" Sort Method: quicksort Memory: 66kB"
" -> HashAggregate (cost=124346.12..124353.44 rows=732 width=40) (actual time=1533.658..1533.759 rows=470 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Group Key: events.id"
" -> Hash Join (cost=4912.30..124344.29 rows=732 width=40) (actual time=56.671..1532.831 rows=971 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Hash Cond: (accountsevents.events_id = events.id)"
" Join Filter: ((accountsevents.accounts_id = 1) OR (events.host_id = 1))"
" Rows Removed by Join Filter: 2761882"
" -> Seq Scan on public.accountsevents (cost=0.00..45228.52 rows=2762852 width=8) (actual time=0.005..466.094 rows=2762853 loops=1)"
" Output: accountsevents.events_id, accountsevents.accounts_id"
" -> Hash (cost=2795.91..2795.91 rows=104191 width=40) (actual time=53.579..53.579 rows=104181 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Buckets: 65536 Batches: 4 Memory Usage: 2462kB"
" -> Seq Scan on public.events (cost=0.00..2795.91 rows=104191 width=40) (actual time=0.004..26.171 rows=104181 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
"Planning time: 0.201 ms"
"Execution time: 1534.024 ms"
没有索引可以帮助您完成此查询。
问题是您在 WHERE
条件中有一个 OR
,因此无法在 表连接之前应用过滤器 ,这是索引可以帮助您的地方。尝试用 AND
替换 OR
,看看 PostgreSQL 如何做得更好。
这样 PostgreSQL 必须计算整个连接并且只能在之后过滤掉行 – 请参阅连接过滤器删除的大量 行。
唯一可以使用索引的是嵌套循环连接,这会更加昂贵。所以我不认为这个查询有更好的计划。
你可以看到PostgreSQL对行数的预估非常好,这通常是PostgreSQL确实做对了的标志。
也许您可以使用
这样的查询做得更好
SELECT * FROM
(SELECT ... FROM events JOIN accountsevents ON ...
WHERE accountsevents.accounts_id = 1
UNION
SELECT ... FROM events JOIN accountsevents ON ...
WHERE events.host_id = 1) sub
GROUP BY ... ORDER BY ...
但我不会打赌。
(注意:此查询在语义上略有不同,但在您的情况下可能无关紧要。)
运行 postgres 9.5
SELECT events.id, events.start_time, events.host_id, events.title from events
JOIN accountsevents ON accountsevents.events_id = events.id
WHERE accountsevents.accounts_id = %(account_id)s OR events.host_id = %(account_id)s
GROUP BY events.id
ORDER BY start_time DESC
我有这个查询,postgres 说费用超过 100k。好像过分了这是我唯一没有使用我为每个 table.
创建的索引的查询Indexes:
"events_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"events_host_id_fkey" FOREIGN KEY (host_id) REFERENCES accounts(id)
Referenced by:
TABLE "accountsevents" CONSTRAINT "accountsevents_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
TABLE "eventsinterests" CONSTRAINT "eventsinterests_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
Indexes:
"accountsevents_pkey" PRIMARY KEY, btree (id, accounts_id, events_id)
Foreign-key constraints:
"accountsevents_accounts_id_fkey" FOREIGN KEY (accounts_id) REFERENCES accounts(id)
"accountsevents_events_id_fkey" FOREIGN KEY (events_id) REFERENCES events(id)
我觉得索引设置有误,或者我只是在查询中遗漏了一些内容。初始序列扫描正在杀死它。
添加了详细解释
"Sort (cost=124388.27..124390.10 rows=732 width=40) (actual time=1533.902..1533.928 rows=470 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Sort Key: events.start_time DESC"
" Sort Method: quicksort Memory: 66kB"
" -> HashAggregate (cost=124346.12..124353.44 rows=732 width=40) (actual time=1533.658..1533.759 rows=470 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Group Key: events.id"
" -> Hash Join (cost=4912.30..124344.29 rows=732 width=40) (actual time=56.671..1532.831 rows=971 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Hash Cond: (accountsevents.events_id = events.id)"
" Join Filter: ((accountsevents.accounts_id = 1) OR (events.host_id = 1))"
" Rows Removed by Join Filter: 2761882"
" -> Seq Scan on public.accountsevents (cost=0.00..45228.52 rows=2762852 width=8) (actual time=0.005..466.094 rows=2762853 loops=1)"
" Output: accountsevents.events_id, accountsevents.accounts_id"
" -> Hash (cost=2795.91..2795.91 rows=104191 width=40) (actual time=53.579..53.579 rows=104181 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
" Buckets: 65536 Batches: 4 Memory Usage: 2462kB"
" -> Seq Scan on public.events (cost=0.00..2795.91 rows=104191 width=40) (actual time=0.004..26.171 rows=104181 loops=1)"
" Output: events.id, events.start_time, events.host_id, events.title"
"Planning time: 0.201 ms"
"Execution time: 1534.024 ms"
没有索引可以帮助您完成此查询。
问题是您在 WHERE
条件中有一个 OR
,因此无法在 表连接之前应用过滤器 ,这是索引可以帮助您的地方。尝试用 AND
替换 OR
,看看 PostgreSQL 如何做得更好。
这样 PostgreSQL 必须计算整个连接并且只能在之后过滤掉行 – 请参阅连接过滤器删除的大量 行。
唯一可以使用索引的是嵌套循环连接,这会更加昂贵。所以我不认为这个查询有更好的计划。
你可以看到PostgreSQL对行数的预估非常好,这通常是PostgreSQL确实做对了的标志。
也许您可以使用
这样的查询做得更好SELECT * FROM
(SELECT ... FROM events JOIN accountsevents ON ...
WHERE accountsevents.accounts_id = 1
UNION
SELECT ... FROM events JOIN accountsevents ON ...
WHERE events.host_id = 1) sub
GROUP BY ... ORDER BY ...
但我不会打赌。
(注意:此查询在语义上略有不同,但在您的情况下可能无关紧要。)