为什么索引 ORDER BY 查询匹配很多行比只匹配少数行的查询快很多?
Why are indexed ORDER BY queries matching many rows a LOT faster than queries matching only a few?
好的,我有以下查询:
explain analyze SELECT seller_region FROM "products"
WHERE "products"."seller_region" = 'Bremen'
AND "products"."state" = 'active'
ORDER BY products.rank DESC,
products.score ASC NULLS LAST,
GREATEST(products.created_at, products.price_last_updated_at) DESC
LIMIT 14 OFFSET 0
查询筛选在 11.000 rows
左右匹配。如果我们查看查询规划器,我们可以看到查询使用索引 index_products_active_for_default_order
并且非常快:
Limit (cost=0.43..9767.16 rows=14 width=36) (actual time=1.576..6.711 rows=14 loops=1)
-> Index Scan using index_products_active_for_default_order on products (cost=0.43..4951034.14 rows=7097 width=36) (actual time=1.576..6.709 rows=14 loops=1)
Filter: ((seller_region)::text = 'Bremen'::text)
Rows Removed by Filter: 3525
Total runtime: 6.724 ms
现在,如果我在查询中将 'Bremen'
替换为 'Sachsen'
:
explain analyze SELECT seller_region FROM "products"
WHERE "products"."seller_region" = 'Sachsen'
AND "products"."state" = 'active'
ORDER BY products.rank DESC,
products.score ASC NULLS LAST,
GREATEST(products.created_at, products.price_last_updated_at) DESC
LIMIT 14 OFFSET 0
同一个查询只匹配大约 70 rows
,并且现在一直非常非常慢,即使它以完全相同的方式使用相同的索引:
Limit (cost=0.43..1755.00 rows=14 width=36) (actual time=2.498..1831.737 rows=14 loops=1)
-> Index Scan using index_products_active_for_default_order on products (cost=0.43..4951034.14 rows=39505 width=36) (actual time=2.496..1831.727 rows=14 loops=1)
Filter: ((seller_region)::text = 'Sachsen'::text)
Rows Removed by Filter: 963360
Total runtime: 1831.760 ms
我不明白为什么会这样?我会出于直觉认为匹配更多行的查询会更慢,但事实恰恰相反。我已经用我表上其他列的其他查询测试过这个,现象是一样的。与上述查询具有相同顺序的两个类似查询呈现匹配更多行的查询比过滤仅匹配少数行的查询快 100 倍。为什么会这样,我该如何避免这种行为?
PS: 我用的是postgres 9.3,索引定义如下:
CREATE INDEX index_products_active_for_default_order
ON products
USING btree
(rank DESC, score COLLATE pg_catalog."default", (GREATEST(created_at, price_last_updated_at)) DESC)
WHERE state::text = 'active'::text;
这是因为在前 3539 个索引行中找到了 Bremen 的前 14 个匹配行,而 Sachsen 则需要扫描 963374 行。
我推荐 (seller_region, rank)
上的索引。
好的,我有以下查询:
explain analyze SELECT seller_region FROM "products"
WHERE "products"."seller_region" = 'Bremen'
AND "products"."state" = 'active'
ORDER BY products.rank DESC,
products.score ASC NULLS LAST,
GREATEST(products.created_at, products.price_last_updated_at) DESC
LIMIT 14 OFFSET 0
查询筛选在 11.000 rows
左右匹配。如果我们查看查询规划器,我们可以看到查询使用索引 index_products_active_for_default_order
并且非常快:
Limit (cost=0.43..9767.16 rows=14 width=36) (actual time=1.576..6.711 rows=14 loops=1)
-> Index Scan using index_products_active_for_default_order on products (cost=0.43..4951034.14 rows=7097 width=36) (actual time=1.576..6.709 rows=14 loops=1)
Filter: ((seller_region)::text = 'Bremen'::text)
Rows Removed by Filter: 3525
Total runtime: 6.724 ms
现在,如果我在查询中将 'Bremen'
替换为 'Sachsen'
:
explain analyze SELECT seller_region FROM "products"
WHERE "products"."seller_region" = 'Sachsen'
AND "products"."state" = 'active'
ORDER BY products.rank DESC,
products.score ASC NULLS LAST,
GREATEST(products.created_at, products.price_last_updated_at) DESC
LIMIT 14 OFFSET 0
同一个查询只匹配大约 70 rows
,并且现在一直非常非常慢,即使它以完全相同的方式使用相同的索引:
Limit (cost=0.43..1755.00 rows=14 width=36) (actual time=2.498..1831.737 rows=14 loops=1)
-> Index Scan using index_products_active_for_default_order on products (cost=0.43..4951034.14 rows=39505 width=36) (actual time=2.496..1831.727 rows=14 loops=1)
Filter: ((seller_region)::text = 'Sachsen'::text)
Rows Removed by Filter: 963360
Total runtime: 1831.760 ms
我不明白为什么会这样?我会出于直觉认为匹配更多行的查询会更慢,但事实恰恰相反。我已经用我表上其他列的其他查询测试过这个,现象是一样的。与上述查询具有相同顺序的两个类似查询呈现匹配更多行的查询比过滤仅匹配少数行的查询快 100 倍。为什么会这样,我该如何避免这种行为?
PS: 我用的是postgres 9.3,索引定义如下:
CREATE INDEX index_products_active_for_default_order
ON products
USING btree
(rank DESC, score COLLATE pg_catalog."default", (GREATEST(created_at, price_last_updated_at)) DESC)
WHERE state::text = 'active'::text;
这是因为在前 3539 个索引行中找到了 Bremen 的前 14 个匹配行,而 Sachsen 则需要扫描 963374 行。
我推荐 (seller_region, rank)
上的索引。