使用 SQL 查找具有极值的线

Finding lines with extreme values using SQL

我有一个 table,t1,有 4 列:

key, cd, date, result_num

在 SAS 中我们有以下代码:

PROC SQL;
    create table t2 AS
    select * from t1
    group by key
    having date = MAX(date)
    order by key, cd;
RUN;

我的印象是,在使用聚合函数(例如 MAX)时选择的所有列都必须在分组依据中或应用了聚合函数。我的目标是将此 SAS 代码转换为 SQL,有没有办法在 SQL(更具体地说是 hiveQL)中执行此操作?

我不认为您的查询在 SAS 中执行您想要的操作。 . .也许是这样。在标准 SQL(和 Hive)中,您可以:

create table t2 AS
    select *
    from (select t1.*,
                 row_number() over (partition by key order by date desc) as seqnum
          from t1
         ) t1
    where seqnum = 1
    order by key, cd;

诀窍是访问您的输入 table 两次:一次计算最大日期,一次 select 适当的数据

如果查找日期是整个table中出现时间最长的行,即

PROC SQL;
    create table t2 AS
    select * from t1
    where date = (select MAX(date) from t1)
    order by key, cd;
RUN;

如果您查找日期是同一键的最高日期的行,即

PROC SQL;
    create table t2 AS
    select * from t1 inner join 
    (  select MAX(date) as maxDate 
       from t1  
       group by key) as m1 
       on m1.key = t1.key and m1.maxDate = t1.date
    order by key, cd;
RUN;