从每个组中返回具有最新时间戳的行

Returning the row with the most recent timestamp from each group

我有一个 table (Postgres 9.3) 定义如下:

 CREATE TABLE tsrs (
     id SERIAL PRIMARY KEY,
     customer_id INTEGER NOT NULL REFERENCES customers,
     timestamp TIMESTAMP WITHOUT TIME ZONE,
     licensekeys_checksum VARCHAR(32));

此处的相关详细信息是 customer_idtimestamplicensekeys_checksum。可以有多个具有相同 customer_id 的条目,其中一些可能具有匹配的 licensekey_checksum 条目,而一些可能不同。永远不会有具有相同校验和和相同时间戳的行。

我想 return 一个 table 包含 1 行的每组行与匹配的 licensekeys_checksum 条目。每个组的行 returned 应该是具有最新/最近时间戳的行。

示例输入:

1, 2, 2014-08-21 16:03:35, 3FF2561A
2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
4, 5, 2013-12-13 08:14:26, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891

期望的输出:

2, 2, 2014-08-22 10:00:41, 3FF2561A
2, 2, 2014-06-10 10:00:41, 081AB3CA
3, 5, 2014-02-01 12:03:23, 299AFF90
5, 6, 2013-09-09 18:21:53, 49FFA891

我已经设法根据下面的评论拼凑出一个查询,并在互联网上搜索了几个小时。 :)

select * from tsrs
inner join (
   select licensekeys_checksum, max(timestamp) as mts
   from tsrs
   group by licensekeys_checksum
   ) x on x.licensekeys_checksum = tsrs.licensekeys_checksum
      and x.mts = tsrs.timestamp;

它似乎有效,但我不确定。我在正确的轨道上吗?

试试这个

select * 
from tsrs
where (timestamp,licensekeys_checksum) in (
                                          select max(timestamp)
                                                ,licensekeys_checksum
                                          from tsrs 
                                          group by licensekeys_checksum) 

>SqlFiddle Demo

with cte as (
            select id
                   ,customer_id
                   ,timestamp
                   ,licensekeys_checksum
                   ,row_number () over (partition by  licensekeys_checksum  ORDER BY timestamp DESC) as rk
            from  tsrs)
select  id
       ,customer_id
       ,timestamp
       ,licensekeys_checksum  
from cte where rk=1 order by id

>SqlFiddle Demo


参考:Window Functions, row_number(), and CTE

替代重复数据删除,使用NOT EXISTS(...)

SELECT *
FROM tsrs t
WHERE NOT EXISTS (
    SELECT *
    FROM tsrs x
    WHERE x.customer_id = t.customer_id                  -- same customer
    AND x.licensekeys_checksum = t.licensekeys_checksum  -- same checksum
    AND x.ztimestamp > t.ztimestamp                      -- but more recent
    );

您在问题中的查询应该比(先前)接受的答案中的查询执行得更好。用 EXPLAIN ANALYZE.

测试

DISTINCT ON 通常更简单、更快:

SELECT DISTINCT ON (licensekeys_checksum) *
FROM   tsrs
ORDER  BY licensekeys_checksum, timestamp DESC NULLS LAST;

db<>fiddle here
sqlfiddle

详细解释:

  • Select first row in each GROUP BY group?