具有多个自连接的大型 table 空间查询执行缓慢

Question

我正在 Postgres 9.3.9 中处理大型 table 的查询。它是一个空间数据集，并且具有空间索引。比如说，我需要找到3种类型的物体：A、B和C。条件是B和C都在A的一定距离内，比如500米。

我的查询是这样的：

select 
  school.osm_id as school_osm_id, 
  school.name as school_name, 
  school.way as school_way, 
  restaurant.osm_id as restaurant_osm_id, 
  restaurant.name as restaurant_name, 
  restaurant.way as restaurant_way, 
  bar.osm_id as bar_osm_id, 
  bar.name as bar_name, 
  bar.way as bar_way 
from (
    select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'school') as school, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'restaurant') as restaurant, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'bar') as bar 
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false) 
  and ST_DWithin(school.way_geo, bar.way_geo, 500, false);

这个查询给了我我想要的，但它需要很长时间，比如 13 秒来执行。我想知道是否有另一种方法来编写查询并提高效率。

查询计划：

Nested Loop  (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
   Buffers: shared hit=530967 read=8733
   ->  Nested Loop  (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
         Buffers: shared hit=389863 read=8707
         ->  Bitmap Heap Scan on planet_osm_point  (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
               Recheck Cond: (amenity = 'school'::text)
               Buffers: shared hit=859 read=5204
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
                     Index Cond: (amenity = 'school'::text)
                     Buffers: shared hit=3 read=24
         ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_1  (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
               Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
               Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
               Rows Removed by Filter: 0
               Buffers: shared hit=389004 read=3503
               ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
                     Buffers: shared hit=384528 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
                           Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                           Buffers: shared hit=146631 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
                           Index Cond: (amenity = 'restaurant'::text)
                           Buffers: shared hit=237897
   ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_2  (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
         Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
         Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
         Rows Removed by Filter: 1
         Buffers: shared hit=141104 read=26
         ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
               Buffers: shared hit=127019
               ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
                     Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                     Buffers: shared hit=101609
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
                     Index Cond: (amenity = 'bar'::text)
                     Buffers: shared hit=25410
 Total runtime: 11238.605 ms

我目前只使用一个 table 1,372,711 行。它有 73 列:

       Column       |         Type         |       Modifiers
--------------------+----------------------+---------------------------
 osm_id             | bigint               | 
 access             | text                 | 
 addr:housename     | text                 | 
 addr:housenumber   | text                 | 
 addr:interpolation | text                 | 
 admin_level        | text                 | 
 aerialway          | text                 | 
 aeroway            | text                 | 
 amenity            | text                 | 
 area               | text                 | 
 barrier            | text                 | 
 bicycle            | text                 | 
 brand              | text                 | 
 bridge             | text                 | 
 boundary           | text                 | 
 building           | text                 | 
 capital            | text                 | 
 construction       | text                 | 
 covered            | text                 | 
 culvert            | text                 | 
 cutting            | text                 | 
 denomination       | text                 | 
 disused            | text                 | 
 ele                | text                 | 
 embankment         | text                 | 
 foot               | text                 | 
 generator:source   | text                 | 
 harbour            | text                 | 
 highway            | text                 | 
 historic           | text                 | 
 horse              | text                 | 
 intermittent       | text                 | 
 junction           | text                 | 
 landuse            | text                 | 
 layer              | text                 | 
 leisure            | text                 | 
 lock               | text                 | 
 man_made           | text                 | 
 military           | text                 | 
 motorcar           | text                 | 
 name               | text                 | 
 natural            | text                 | 
 office             | text                 | 
 oneway             | text                 | 
 operator           | text                 | 
 place              | text                 | 
 poi                | text                 | 
 population         | text                 | 
 power              | text                 | 
 power_source       | text                 | 
 public_transport   | text                 | 
 railway            | text                 | 
 ref                | text                 | 
 religion           | text                 | 
 route              | text                 | 
 service            | text                 | 
 shop               | text                 | 
 sport              | text                 | 
 surface            | text                 | 
 toll               | text                 | 
 tourism            | text                 | 
 tower:type         | text                 | 
 tunnel             | text                 | 
 water              | text                 | 
 waterway           | text                 | 
 wetland            | text                 | 
 width              | text                 | 
 wood               | text                 | 
 z_order            | integer              | 
 tags               | hstore               | 
 way                | geometry(Point,4326) | 
 way_geo            | geography            | 
 gid                | integer              | not null default nextval('...
Indexes:
    "planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
    "idx_planet_osm_point_amenity" btree (amenity)
    "idx_planet_osm_point_waygeo" gist (way_geo)
    "planet_osm_point_index" gist (way)
    "planet_osm_point_pkey" btree (osm_id)

便利学校、餐厅、酒吧分别有8811、6291、779排。

Answer 1

如果使用显式连接，会有什么不同吗？

SELECT a.id as a_id, a.name as a_name, a.geog as a_geog,
       b.id as b_id, b.name as b_name, b.geog as b_geog,
       c.id as c_id, c.name as c_name, c.geog as c_geog
FROM table1 a
JOIN table1 b ON b.type = 'B' AND ST_DWithin(a.geog, b.geog, 100)
JOIN table1 c ON c.type = 'C' AND ST_DWithin(a.geog, c.geog, 100)
WHERE a.type = 'A';

Answer 2

用内连接语法试试这个并比较结果，应该没有重复的。我的猜测是它应该比原始查询花费 1/3 或更好的时间：

select a.id as a_id, a.name as a_name, a.geog as a_geo,
       b.id as b_id, b.name as b_name, b.geog as b_geo,
       c.id as c_id, c.name as c_name, c.geog as c_geo
from table1 as a
INNER JOIN table1 as b on b.type='B'
INNER JOIN table1 as c on c.type='C'
WHERE a.type='A' and
     (ST_DWithin(a.geo, b.geo, 100) and ST_DWithin(a.geo, c.geo, 100))

Answer 3

您使用的 3 个子选择非常低效。将它们写成 LEFT JOIN 子句，查询应该更有效率：

SELECT
  school.osm_id AS school_osm_id, 
  school.name AS school_name, 
  school.way AS school_way, 
  restaurant.osm_id AS restaurant_osm_id, 
  restaurant.name AS restaurant_name, 
  restaurant.way AS restaurant_way, 
  bar.osm_id AS bar_osm_id, 
  bar.name AS bar_name, 
  bar.way AS bar_way 
FROM planet_osm_point school
LEFT JOIN planet_osm_point restaurant ON restaurant.amenity = 'restaurant' AND
                               ST_DWithin(school.way_geo, restaurant.way_geo, 500, false) 
LEFT JOIN planet_osm_point bar ON bar.amenity = 'bar' AND
                               ST_DWithin(school.way_geo, bar.way_geo, 500, false)
WHERE school.amenity = 'school'
  AND (restaurant.osm_id IS NOT NULL OR bar.osm_id IS NOT NULL);

但是，如果每所学校有多家餐馆和酒吧，这会给出太多结果。您可以像这样简化查询：

SELECT
  school.osm_id AS school_osm_id, 
  school.name AS school_name, 
  school.way AS school_way, 
  a.osm_id AS amenity_osm_id, 
  a.amenity AS amenity_type,
  a.name AS amenity_name, 
  a.way AS amenity_way, 
FROM planet_osm_point school
JOIN planet_osm_point a ON ST_DWithin(school.way_geo, a.way_geo, 500, false) 
WHERE school.amenity = 'school'
  AND a.amenity IN ('bar', 'restaurant');

这将为每所学校的每个酒吧和餐厅提供。 500m以内既没有餐厅也没有酒吧的学校未列出。

Answer 4

这个查询应该有很长的路要走（快很多）：

WITH school AS (
   SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
   FROM   planet_osm_point s
        , LATERAL (
      SELECT  1 FROM planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'bar'
      LIMIT   1  -- bar exists -- most selective first if possible
      ) b
        , LATERAL (
      SELECT  1 FROM planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'restaurant'
      LIMIT   1  -- restaurant exists
      ) r
   WHERE  s.amenity = 'school'
   )
SELECT * FROM (
   TABLE school  -- schools

   UNION ALL  -- bars
   SELECT s.school_id, 'bar', x.*
   FROM   school s
        , LATERAL (
      SELECT  osm_id, name, way_geo
      FROM    planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'bar'
      ) x

   UNION ALL  -- restaurants
   SELECT s.school_id, 'rest.', x.*
   FROM   school s
        , LATERAL (
      SELECT  osm_id, name, way_geo
      FROM    planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'restaurant'
      ) x
   ) sub
ORDER BY school_id, (type <> 'school'), type, osm_id;

这不是与您的原始查询相同，而是您真正想要的 :

I want a list of schools that have restaurants and bars within 500 meters and I need the coordinates of each school and its corresponding restaurants and bars.

所以这个查询 returns 这些学校的列表，然后是附近的酒吧和餐馆。每组行由 school_id 列中学校的 osm_id 保持在一起。

现在使用 LATERAL 连接，以利用空间 GiST 索引。

TABLE school 只是 shorthand 对于 SELECT * FROM school:

表达式 (type <> 'school') 将每个集合中的学校排在第一位，因为：

SQL select query order by day and month

最后SELECT中的子查询sub只需要按这个表达式排序。 UNION 查询将附加的 ORDER BY 列表限制为只有列，没有表达式。

我专注于您为回答此问题而提出的查询 - 忽略对其他 70 个文本列中的任何一个进行过滤的扩展要求。这真的是一个设计缺陷。搜索条件应集中在少数列中。或者您必须为所有 70 列建立索引，而像我要提议的那样的多列索引几乎不是一个选项。仍然可能虽然 ...

索引

除了现有的：

"idx_planet_osm_point_waygeo" gist (way_geo)

如果始终在同一列上进行过滤，则可以创建 multicolumn index covering the few columns you are interested in, so index-only scans 成为可能：

CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)

Postgres 9.5

即将推出的 Postgres 9.5 引入了 重大改进 正好可以解决您的问题：

Allow queries to perform accurate distance filtering of bounding-box-indexed objects (polygons, circles) using GiST indexes (Alexander Korotkov, Heikki Linnakangas)

Previously, a common table expression was required to return a large number of rows ordered by bounding-box distance, and then filtered further with a more accurate non-bounding-box distance calculation.

Allow GiST indexes to perform index-only scans (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)

您对此特别感兴趣。现在你可以有一个单多列（覆盖）GiST索引：

CREATE INDEX reservations_range_idx ON reservations
USING gist(amenity, way_geo, name, osm_id)

并且：

Improve bitmap index scan performance (Teodor Sigaev, Tom Lane)

并且：

Add GROUP BY analysis functions GROUPING SETS, CUBE and ROLLUP (Andrew Gierth, Atri Sharma)

为什么？因为 ROLLUP 会简化我建议的查询。相关回答：

Grouping() equivalent in PostgreSQL?

第一个 alpha 版本已于 2015 年 7 月 2 日发布。The expected timeline for the release:

This is the alpha release of version 9.5, indicating that some changes to features are still possible before release. The PostgreSQL Project will release 9.5 beta 1 in August, and then periodically release additional betas as required for testing until the final release in late 2015.

基础知识

当然，一定不要忽视基础知识：

Slow Query Questions page on the PostgreSQL Wiki

具有多个自连接的大型 table 空间查询执行缓慢

Spatial query on large table with multiple self joins performing slow

sql

postgresql

postgis

spatial

postgresql-performance

索引

Postgres 9.5

基础知识