Postgres SQL GROUP BY 不跳行?

Postgres SQL GROUP BY without jumping rows?

假设我在 table:

中有这些数据
 id | thing | operation | timestamp
----+-------+-----------+-----------
  0 | foo   |       add |         0
  0 | bar   |       add |         1
  1 | baz   |    remove |         2
  1 | dim   |       add |         3
  0 | foo   |    remove |         4
  0 | dim   |       add |         5

有什么方法可以构建一个 Postgres SQL 查询,该查询将按 ID 和操作进行分组,但不将具有较高时间戳值的行与具有较低时间戳值的行进行分组?我想从查询中得到这个:

 id |  things  | operation
----+----------+-----------
  0 | foo, bar |       add
  1 |      baz |    remove
  1 |      dim |       add
  0 |      foo |    remove
  0 |      dim |       add

基本上分组依据,但仅限于按时间戳排序的相邻行。

如果您的示例数据足够好,也许这可行:

select id, string_agg(thing,',') as things, operation
from tablename
group by id, operation

即使用 id 和 operation 来查找要连接的东西。

已编辑,现在使用 string_agg 而不是 group_concat。

这是一个gaps and islands问题(虽然这篇文章是针对SQL-服务器的,但它很好地描述了这个问题,所以仍然适用于Postgresql),并且可以使用排名函数解决:

SELECT  id,
        thing,
        operation,
        timestamp,
        ROW_NUMBER() OVER(ORDER BY timestamp) - 
                ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet,
        ROW_NUMBER() OVER(ORDER BY timestamp) AS PositionInSet,
        ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS PositionInGroup
FROM    T
ORDER BY timestamp;

如您所见,通过获取集合中的整体位置,并减去组中的位置,您可以识别岛屿,其中 (id, operation, groupingset) 的每个独特组合代表一个岛屿:

id  thing   operation   timestamp   groupingSet PositionInSet   PositionInGroup
0   foo     add         0           0           1               1
0   bar     add         1           0           2               2           
1   baz     remove      2           2           3               1
1   dim     add         3           3           4               1
0   foo     remove      4           4           5               1
0   dim     add         5           3           6               3

然后你只需要将它放在一个子查询中,并按相关字段分组,然后使用 string_agg 连接你的东西:

SELECT  id, STRING_AGG(thing) AS things, operation
FROM    (   SELECT  id,
                    thing,
                    operation,
                    timestamp,
                    ROW_NUMBER() OVER(ORDER BY timestamp) - 
                            ROW_NUMBER() OVER(PARTITION BY id, operation ORDER BY Timestamp) AS groupingSet
            FROM    T
        ) AS t
GROUP BY id, operation, groupingset;

您可以按 id 结果计算组中的不同操作,并使用此计数器将 2 个选择联合到 table:

WITH cnt AS (
  SELECT id, operations_cnt FROM (
    SELECT id, array_length(array_agg(DISTINCT operation),1) AS operations_cnt
    FROM test GROUP BY id
  ) AS t
  WHERE operations_cnt=1
)
SELECT id, string_agg(things, ','), operation, MAX(timestamp) AS timestamp
FROM test
WHERE id IN (SELECT id FROM cnt) GROUP BY id, operation
UNION ALL
SELECT id, things, operation, timestamp
FROM test
WHERE id NOT IN (SELECT id FROM cnt)
ORDER BY timestamp;

结果:

 id | string_agg | operation | timestamp 
----+------------+-----------+-----------
  0 | foo,bar    | add       |         1
  1 | baz        | remove    |         2
  1 | dim        | add       |         3
  2 | foo        | remove    |         4
  2 | dim        | add       |         5
(5 rows)