优化 postgis 查询 - 为什么不使用第二个索引？

Question

我们有一个包含数千万个多边形的 table，我们有这个索引：

CREATE INDEX IF NOT EXISTS polygons_geog_idx ON polygons USING GIST(geog);

这让我们可以真正高效地查询数据库，如下所示：

SELECT * FROM polygons WHERE st_dwithin('SRID=4326;POINT(-1 50)'::geography, geog, 500);

现在由于业务需要，我们只需要return最大的200个多边形。像这样很容易做到：

LIMIT 200
ORDER BY st_area(geog)

完整查询： SELECT gid, st_area(geog) as size FROM polygons WHERE st_dwithin(geog, 'SRID=4326;POINT(-1 50)'::geography, 500) ORDER BY st_area(geog) DESC LIMIT 200.

由于 order by 和 select 我们的查询速度降低了 10 倍。我认为通过添加另一个索引可以很容易地解决这个问题 : CREATE INDEX polygons_geog_area_idx ON polygons (st_area(geog));

但是polygons_geog_area_idx好像没有捡到:

Sort  (cost=8.23..8.23 rows=1 width=12) (actual time=133.755..142.427 rows=2325 loops=1)
  Sort Key: (st_area(geog, true))
  Sort Method: quicksort  Memory: 205kB
  ->  Index Scan using polygons_geog_idx on polygons  (cost=0.14..8.22 rows=1 width=12) (actual time=0.468..121.974 rows=2325 loops=1)
        Index Cond: (geog && '0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography)
        Filter: (('0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography && _st_expand(geog, '500'::double precision)) AND _st_dwithin(geog, '0101000020E6100000C33126587787F1BF3B0D62B197654940'::geography, '500'::double precision, true))
        Rows Removed by Filter: 3
Planning Time: 0.157 ms
Execution Time: 151.196 ms

（注意：这是在开发数据集上，比实际数据集小得多，稍后将运行）

我错过了什么？你能像我想要的那样使用 2 个索引吗？

Answer 1

PostgreSQL不能这样合并两个索引，一个是顺序，一个是选择性。

要按面积排序，首先需要计算面积。排序本身很快（只用了 15% 的时间）所以一定是面积的计算很慢。 EXPLAIN VERBOSE 向我建议面积的计算是作为索引扫描的一部分完成的，然后将结果传递给排序，而不是在排序本身中完成。因此，将执行此操作的时间归因于索引扫描是有道理的。

要缩短计算面积所需的时间，您可以将其计算并存储为 table 的一部分。最好的方法（使用足够新的版本）是使用生成的列。

alter table polygons add polygon_area double precision generated always as (st_area(geog)) stored;

优化 postgis 查询 - 为什么不使用第二个索引？

Optimizing a postgis query - why is 2nd index not being used?

postgresql

indexing

performance

postgis