如何根据 Redshift 中的条件获取最短日期？

Question

假设您有以下数据集：

id    date_col       boolean_col
1     2020-01-01     0
1     2020-01-05     1
1     2020-02-01     0
1     2020-03-01     1
2     2020-01-01     0
2     2020-05-01     0
3     2020-01-01     0
3     2020-03-05     1

我的最终输出应该分组，每个 id 一行。我想要分组的方式是：如果布尔列为真，我想带上最小值 - 或最大值，我想测试两者，如果可能的话 - id 的日期。如果 id 的所有布尔列都是假的，那么我想获得最高日期。所需的输出将是这样的：

id    date_col       boolean_col
1     2020-01-05     1
2     2020-05-01     0
3     2020-03-05     1

关于如何获得这个的任何想法？我真的很苦恼想办法

Answer 1

一种方法是row_number():

select t.*
from (select t.*,
             row_number() over (partition by id order by boolean_col desc, date desc) as seqnum
      from t
     ) t
where seqnum = 1;

还有另外两个有趣的方法。一种是巧妙地聚合：

select id,
       coalesce(max(case when boolean_col = 1 then date end),
                max(date)
               ) as date,
       max(boolean_col)
from t
group by id;

另一个将此视为优先顺序并使用 union all:

select id, max(date), boolean_col
from t
where boolean_col = 1
group by id
union all
select id, max(date), max(boolean_col)
from t
group by id
having max(boolean_col) = 0;

如何根据 Redshift 中的条件获取最短日期？

How can I get the minimum date based on a condition in Resdhift?

sql

boolean

group-by

amazon-redshift