Select 从 table 删除相似的行 - PostgreSQL

Question

有一个 table 包含文档修订和作者。看起来像这样：

doc_id    rev_id   rev_date            editor    title,content so on....
123        1       2016-01-01 03:20    Bill        ......
123        2       2016-01-01 03:40    Bill
123        3       2016-01-01 03:50    Bill
123        4       2016-01-01 04:10    Bill
123        5       2016-01-01 08:40    Alice
123        6       2016-01-01 08:41    Alice
123        7       2016-01-01 09:00    Bill
123        8       2016-01-01 10:40    Cate
942        9       2016-01-01 11:10    Alice
942       10       2016-01-01 11:15    Bill
942       15       2016-01-01 11:17    Bill

我需要找出文档被转移到另一个编辑器的时刻 - 只有每个版本系列的第一行。

像这样：

doc_id    rev_id   rev_date            editor    title,content so on....
123        1       2016-01-01 03:20    Bill        ......
123        5       2016-01-01 08:40    Alice
123        7       2016-01-01 09:00    Bill
123        8       2016-01-01 10:40    Cate
942        9       2016-01-01 11:10    Alice
942       10       2016-01-01 11:15    Bill

如果我使用 DISTINCT ON (doc_id, editor) 它求助于 table 并且我只看到每个文档和编辑器一个，这是不正确的。当然，我可以转储所有内容并使用 shell 工具进行过滤，例如 awk |排序 |独特的。但是对于大tables来说是不好的。

Window 像FIRST_ROW 这样的函数没有给出太多，因为我不能按doc_id 分区，编辑不要把它们都弄乱。

如何做得更好？

谢谢。

Answer 1

可以用lag()得到之前的值，然后简单比较一下：

select t.*
from (select t.*,
             lag(editor) over (partition by doc_id order by rev_date) as prev_editor
      from t
     ) t
where prev_editor is null or prev_editor <> editor;

Select 从 table 删除相似的行 - PostgreSQL

Select from table removing similar rows - PostgreSQL

postgresql

window

distinct

aggregate-functions