SQL 根据非空的其他列查找最大日期

SQL find max date based on a non null other column

我有一个 table 像这样:

|uniqueID|scandatetime       |scanfacilityname|
+--------+-------------------+----------------+
|12345678|01-01-2020 13:45:12|BALTIMORE       |
|12345678|01-02-2020 22:45:12|BALTIMORE       |
|12345678|01-04-2020 10:15:12|PHILADELPHIA    |
|12345678|01-05-2020 08:45:12|                |

而且我想要 return 整行包含 uniqueID、scandatetime 和最新的 scanfacilityname(即 max scandatetime,其中 scanfacilityname 不为空)。我尝试了以下查询:

SELECT
"uniqueID"
, "max"(CAST("scandatetime" AS timestamp)) "timestamp"
, COALESCE("scanfacilityname") "scanfacilityname"
FROM
iv_scans_new.scan_data
WHERE (("partition_0" = '2020') AND ("partition_1" IN ('06', '07', '08'))) and  scanfacilityname is not null
group by 1, 3
;

但我不确定这是不是 correct/if 我需要合并。

一个选项是使用子查询进行过滤:

select s.*
from iv_scans_new.scan_data s
where s.scandatetime = (
    select max(s1.scandatetime)
    from iv_scans_new.scan_data s1
    where s1.uniqueID = s.uniqueID and s1.scanfacilityname is not null
)

您也可以使用 row_number():

select *
from (
    select 
        s.*, 
        row_number() over(partition by uniqueID order by scandatetime desc) rn
    from iv_scans_new.scan_data s
    where scanfacilityname is not null
) s
where rn = 1

您可以使用max_by函数:

select max_by(uniqueID, scanfacilityname), max_by(scandatetime, scanfacilityname), max(scanfacilityname)

参见doc

不需要 coalesce,因为 maxmax_by 函数将有效地忽略 null 值。