有没有办法确保 WHERE 子句发生在 DISTINCT 之后?

Is there a way to ensure WHERE clause happens after DISTINCT?

假设您的数据库中有一个 table comments

评论 table 包含 idtextshowcomment_id_no

如果用户输入评论,它会在数据库中插入一行

| id |  comment_id_no | text | show | inserted_at |
| -- | -------------- | ---- | ---- | ----------- |
| 1  | 1              | hi   | true | 1/1/2000    |

如果用户想要更新该评论,它会在数据库中插入一个新行

| id |  comment_id_no | text | show | inserted_at |
| -- | -------------- | ---- | ---- | ----------- |
| 1  | 1              | hi   | true | 1/1/2000    |
| 2  | 1              | hey  | true | 1/1/2001    |

注意它保持不变 comment_id_no。这样我们就可以看到评论的历史记录。

现在用户决定不再显示他们的评论

| id |  comment_id_no | text | show  | inserted_at |
| -- | -------------- | ---- | ----- | ----------- |
| 1  | 1              | hi   | true  | 1/1/2000    |
| 2  | 1              | hey  | true  | 1/1/2001    |
| 3  | 1              | hey  | false | 1/1/2002    |

这对最终用户隐藏了评论。

现在发表第二条评论(不是第一条的更新)

| id |  comment_id_no | text | show  | inserted_at |
| -- | -------------- | ---- | ----- | ----------- |
| 1  | 1              | hi   | true  | 1/1/2000    |
| 2  | 1              | hey  | true  | 1/1/2001    |
| 3  | 1              | hey  | false | 1/1/2002    |
| 4  | 2              | new  | true  | 1/1/2003    |

我希望能够做的是 select unique commend_id_no 的所有最新版本,其中 show 等于 true。但是,我不希望查询 return id=2.

查询需要执行的步骤...

  1. select 所有最近的、不同的 comment_id_no。 (应该 return id=3id=4
  2. select where show = true (should only return id=4)

Note: I am actually writing this query in elixir using ecto and would like to be able to do this without using the subquery function. If anyone can answer this in sql I can convert the answer myself. If anyone knows how to answer this in elixir then also feel free to answer.

我想你想要:

select c.*
from comments c
where c.inserted_at = (select max(c2.inserted_at)
                       from comments c2
                       where c2.comment_id_no = c.comment_id_no
                      ) and
      c.show = 'true';

我不明白这与 select distinct 有什么关系。您只需要评论的最新版本,然后检查是否可以显示。

编辑:

在 Postgres 中,我会这样做:

select c.*
from (select distinct on (comment_id_no) c.*
      from comments c
      order by c.comment_id_no, c.inserted_at desc
     ) c
where c.show

distinct on 通常具有很好的性能特征。

如果您是 运行 Postgres 8.4 或更高版本,ROW_NUMBER() 是最有效的解决方案:

SELECT *
FROM (
    SELECT c.*, ROW_NUMBER() OVER(PARTITION BY comment_id_no ORDER BY inserted_at DESC) rn
    FROM comments c
    WHERE c.show = 'true'
) x WHERE rn = 1

否则,这也可以使用 WHERE NOT EXISTS 条件来实现,以确保您显示的是最新评论:

SELECT c.*
FROM comments c
WHERE 
    c.show = 'true '
    AND NOT EXISTS (
        SELECT 1 
        FROM comments c1 
        WHERE c1.comment_id_no = c.comment_id_no AND c1.inserted_at > c.inserted_at
    )

您必须使用 group by 获取最新的 ID 并加入评论 table 以过滤掉 show = false:

的行
select c.* 
from comments c inner join (
  select comment_id_no, max(id) maxid
  from comments
  group by comment_id_no 
) g on g.maxid = c.id
where c.show = 'true'

我假设列 id 是唯一的并且在 comments table 中自动递增。
参见 demo

您可以在不使用子查询的情况下使用 LEFT JOIN:

SELECT  c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM    Comments AS c
        LEFT JOIN Comments AS c2
            ON c2.comment_id_no = c.comment_id_no
            AND c2.inserted_at > c.inserted_at
WHERE   c2.id IS NULL
AND     c.show = 'true';

我认为所有其他方法都需要某种子查询,这通常使用排名函数来完成:

SELECT  c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM    (   SELECT  c.id, 
                    c.comment_id_no, 
                    c.text, 
                    c.show, 
                    c.inserted_at,
                    ROW_NUMBER() OVER(PARTITION BY c.comment_id_no 
                                      ORDER BY c.inserted_at DESC) AS RowNumber
            FROM    Comments AS c
        ) AS c
WHERE   c.RowNumber = 1
AND     c.show = 'true';

由于您使用 Postgresql 进行了标记,因此您还可以使用 DISTINCT ON ():

SELECT  *
FROM    (   SELECT  DISTINCT ON (c.comment_id_no) 
                    c.id, c.comment_id_no, c.text, c.show, c.inserted_at
            FROM    Comments AS c 
            ORDER By c.comment_id_no, inserted_at DESC
        ) x
WHERE   show = 'true';

Examples on DB<>Fiddle

正如我在评论中所说,我不建议用 history/auditory 东西污染数据 table。

And no: "double versioning" suggested by @Josh_Eller in his comment isn't a good solution too: Not only for complicating queries unnecessarily but also for being much more expensive in terms of processing and tablespace fragmentation.

Take in mind that UPDATE operations never update anything. They instead write a whole new version of the row and mark the old one as deleted. That's why vacuum processes are needed to defragment tablespaces in order to recover that space.

无论如何,除了次优之外,该方法迫使您实施更多 读取和写入数据的复杂查询,而实际上,我想大多数时候您只需要 select、插入、更新甚至删除单行,最终只需要查看其历史记录。

所以最好的解决方案(恕我直言)是简单地实现您实际需要的架构 为了你的主要任务,在一个单独的 table 和 由触发器维护。

这会更多:

  • 稳健且简单:因为你每次都专注于一件事情(Single 责任与 KISS 原则)。

  • Fast:听觉操作可以在after触发器中执行所以 每次执行 INSERTUPDATEDELETE 任何可能的锁定 事务中的内容尚未释放,因为数据库引擎知道其结果不会改变。

  • 高效: 即当然,更新会插入一个新行并标记 旧的已删除。但这将由数据库引擎在低级别完成,不仅如此:您的听觉数据将完全没有碎片(因为您只在那里写:从不更新)。所以整体碎片总是会少得多。

话虽如此,如何实现?

假设这个简单的模式:

create table comments (
    text text,
    mtime timestamp not null default now(),
    id serial primary key
);

create table comments_audit ( -- Or audit.comments if using separate schema
    text text,
    mtime timestamp not null,
    id integer,
    rev integer not null,
    primary key (id, rev)
);

...然后这个函数和触发器:

create or replace function fn_comments_audit()
returns trigger
language plpgsql
security definer
    -- This allows you to restrict permissions to the auditory table
    -- because the function will be executed by the user who defined
    -- it instead of whom executed the statement which triggered it.
as $$
DECLARE
BEGIN

    if TG_OP = 'DELETE' then
        raise exception 'FATAL: Deletion is not allowed for %', TG_TABLE_NAME;
        -- If you want to allow deletion there are a few more decisions to take...
        -- So here I block it for the sake of simplicity ;-)
    end if;

    insert into comments_audit (
        text
        , mtime
        , id
        , rev
    ) values (
        NEW.text
        , NEW.mtime
        , NEW.id
        , coalesce (
            (select max(rev) + 1 from comments_audit where id = new.ID)
            , 0
        )
    );

    return NULL;

END;
$$;

create trigger tg_comments_audit
    after insert or update or delete
    on public.comments
    for each row
    execute procedure fn_comments_audit()
;

仅此而已。

请注意,在这种方法中,您将始终拥有当前的 评论 数据 在 comments_audit 中。您可以改为使用 OLD 寄存器,并且只 在 UPDATE(和 DELETE)操作中定义触发器以避免它。

但我更喜欢这种方法,不仅因为它给了我们额外的冗余(一个 意外删除 - 如果允许或意外触发 disabled- on the master table,那么我们将能够从中恢复所有数据 听觉的)而且还因为它简化(和优化)查询 需要时的历史记录。

现在您只需要以完全透明的方式插入、更新或 select(如果您开发更多此模式,甚至删除,即通过插入带有空值的行...),就像如果不是任何听觉系统。而且,当您需要该数据时,您只需要查询听觉table。

NOTE: Additionally you could want to include a creation timestamp (ctime). In this case it would be interesting to prevent it of being modified in a BEFORE trigger so I omitted it (for the sake of simplicity again) because you can already guess it from the mtimes in the auditory table (even if you are going to use it in your application it would be very advisable to add it).