在 Postgres 中将 table raw 分成块,限制为 st_dwithin

Divide table raw into chunks in Postgres with st_dwithin limit

我得到了一个带有线串的 table,我想将其分成块,这些块的 ID 列表不高于为每个块提供的编号,并且仅存储在一定距离内的线。

例如,我有一个 table 有 14 行

create table lines ( id integer primary key, geom geometry(linestring) );
insert into lines (id, geom) values ( 1, 'LINESTRING(0 0, 0 1)');
insert into lines (id, geom) values ( 2, 'LINESTRING(0 1, 1 1)');
insert into lines (id, geom) values ( 3, 'LINESTRING(1 1, 1 2)');
insert into lines (id, geom) values ( 4, 'LINESTRING(1 2, 2 2)');
insert into lines (id, geom) values ( 11, 'LINESTRING(2 2, 2 3)');
insert into lines (id, geom) values ( 12, 'LINESTRING(2 3, 3 3)');
insert into lines (id, geom) values ( 13, 'LINESTRING(3 3, 3 4)');
insert into lines (id, geom) values ( 14, 'LINESTRING(3 4, 4 4)');
create index lines_gix on lines using gist(geom);

我想把它分成块,每个块有 3 个 ID,每行之间的距离在 2 米以内,或者第一个。

我试图从此示例中获得的结果是:

| Chunk No.|  Id chunk list |
|----------|----------------|
|     1    |    1, 2, 3     |
|     2    |    4, 5, 6     |
|     3    |    7, 8, 9     |
|     4    |   10, 11, 12   |
|     5    |      13, 14    |

我尝试使用 st_clusterwithin 但是当线彼此靠近时它会 return 所有的线都不会分成块。

我还尝试使用一些具有递归魔法的方法,例如 Paul Ramsey here 提供的答案中的那个。但我不知道如何将查询修改为 return limited grouped id list。

我不确定这是否是最佳答案,所以如果有人有更好的方法或知道如何改进提供的答案,请随时更新它。对 Paul 的回答稍加修改后,我成功地创建了以下符合我要求的查询。

    -- Create function for easier interaction
CREATE OR REPLACE FUNCTION find_connected(integer, double precision, integer, integer[])
  returns integer[] AS
$$
WITH RECURSIVE lines_r AS -- Recursive allow to use the same query on the output - is like continues append to result and use it inside a query
    (SELECT ARRAY[id] AS idlist,
            geom, id
           FROM lines
           WHERE id = 
    UNION ALL
    SELECT array_append(lines_r.idlist, lines.id) AS idlist, -- append id list to array
           lines.geom                             AS geom,   -- keep geometry
           lines.id                               AS id -- keep source table id
    FROM (SELECT * FROM lines WHERE NOT  @> array[id]) lines, lines_r -- from source table and recursive table
    WHERE ST_DWITHIN(lines.geom, lines_r.geom, ) -- where lines are within 2 meters
      AND NOT lines_r.idlist @> ARRAY[lines.id] -- recursive id list array not contain lines array
     AND array_length(idlist, 1) <= 
    )
SELECT idlist
FROM lines_r WHERE array_length(idlist, 1) <=  ORDER BY array_length(idlist, 1) DESC LIMIT 1;
$$
LANGUAGE 'sql';

-- Create id chunks
WITH RECURSIVE groups_r AS (
    (SELECT find_connected(id, 2, 3, ARRAY[id]) AS idlist, find_connected(id, 2, 3, ARRAY[id]) AS grouplist, id
                             FROM lines WHERE id = 1)
    UNION ALL
    (SELECT array_cat(groups_r.idlist, find_connected(lines.id, 2, 3, groups_r.idlist)) AS idlist,
            find_connected(lines.id, 2, 3, groups_r.idlist)            AS grouplist,
            lines.id
     FROM lines,
          groups_r
     WHERE NOT groups_r.idlist @> ARRAY[lines.id]
     LIMIT 1))
SELECT 
--        (SELECT array_agg(DISTINCT x) FROM unnest(idlist) t (x))    idlist, -- left for better understanding what is happening
       row_number() OVER () chunk_id,
       (SELECT array_agg(DISTINCT x) FROM unnest(grouplist) t (x)) grouplist,
       id input_line_id
FROM groups_r;

唯一的问题是,当chunk中的id数量增加时,性能非常纯粹。对于具有 300 行和每个块 20 个 ID 的 table,执行时间约为 15 分钟,即使在几何和 ID 列上有索引也是如此。