如何使用 window 函数获取一组组的最小值但以就地排序方式分组
How to Use window function to get minimum value for bunch of groups but grouped in inplace sorted manner
我有一个table说客户
DATE
ID
TYPE
2018-01-01
1
FI
2019-01-01
1
LF
2020-01-01
1
LF
2021-01-01
1
FI
2022-01-01
1
LF
现在我要做的是添加一个新列“NEW_DATE”,其中我有如下逻辑:
if type = FI then set it NULL
Else if LF then take the lowest Date which is following the continuous order
预期输出:
DATE
ID
TYPE
NEW_DATE
2018-01-01
1
FI
NULL
2019-01-01
1
LF
2019-01-01
2020-01-01
1
LF
2019-01-01
2021-01-01
1
FI
NULL
2022-01-01
1
LF
2022-01-01
在第 2 行和第 3 行中有连续的 LF 代码,因此它将采用最低日期,即 2019-01-01,而在第 4 行中,有空值和断链。所以第 5 行得到 2022-01-01
现在我想像
一样使用window函数
CASE
WHEN TYPE <> 'LF'
THEN NULL
ELSE MIN(DATE) OVER (PARTITION BY TYPE ORDER BY DATE)
END AS NEW_DATE
但这将所有类型视为一个整体。那么有什么解决办法呢?
我想你想要:
(CASE WHEN TYPE = 'LF'
THEN MIN(CASE WHEN TYPE = 'FI' THEN DATE END) OVER (PARTITION BY ID ORDER BY DATE DESC)
END) AS NEW_DATE
我添加了 PARTITION BY ID
,因为我猜你想要这个 ID
。如果不是,则删除 PARTITION BY
子句。
Here 是一个 db<>fiddle.
编辑:
我误解了这个问题。你似乎想要:
select t.*,
(case when type = 'LF'
then max(case when type = 'LF' and (prev_type is null or prev_type <> type)
then date
end) over (partition by id order by date)
end) as lf_date
from (select t.*,
lag(type) over (partition by id order by date) as prev_type
from t
) t
我想我明白你需要什么,你需要为每个类型块定义一个组,然后获取每个块的最小日期:
这是一种间隙和孤岛问题。在 CTE 中,我们只为类型 LF 分配一个递增的行号,并从中减去所有行的行号,这导致连续类型的值相同。然后,这为 partition/group 提供了一种机制,以便获得每个连续类型块的最小日期。
with grouped as (
select id, date, type,
case when type='LF' then Row_Number() over (partition by id, type order by date) end -Row_Number() over (partition by id order by date) gp
from customer
)
select date, id, type,
case when type='LF' then Min(date) over(partition by gp) end New_Date
from grouped
order by date
我有一个table说客户
DATE | ID | TYPE |
---|---|---|
2018-01-01 | 1 | FI |
2019-01-01 | 1 | LF |
2020-01-01 | 1 | LF |
2021-01-01 | 1 | FI |
2022-01-01 | 1 | LF |
现在我要做的是添加一个新列“NEW_DATE”,其中我有如下逻辑:
if type = FI then set it NULL
Else if LF then take the lowest Date which is following the continuous order
预期输出:
DATE | ID | TYPE | NEW_DATE |
---|---|---|---|
2018-01-01 | 1 | FI | NULL |
2019-01-01 | 1 | LF | 2019-01-01 |
2020-01-01 | 1 | LF | 2019-01-01 |
2021-01-01 | 1 | FI | NULL |
2022-01-01 | 1 | LF | 2022-01-01 |
在第 2 行和第 3 行中有连续的 LF 代码,因此它将采用最低日期,即 2019-01-01,而在第 4 行中,有空值和断链。所以第 5 行得到 2022-01-01
现在我想像
一样使用window函数CASE
WHEN TYPE <> 'LF'
THEN NULL
ELSE MIN(DATE) OVER (PARTITION BY TYPE ORDER BY DATE)
END AS NEW_DATE
但这将所有类型视为一个整体。那么有什么解决办法呢?
我想你想要:
(CASE WHEN TYPE = 'LF'
THEN MIN(CASE WHEN TYPE = 'FI' THEN DATE END) OVER (PARTITION BY ID ORDER BY DATE DESC)
END) AS NEW_DATE
我添加了 PARTITION BY ID
,因为我猜你想要这个 ID
。如果不是,则删除 PARTITION BY
子句。
Here 是一个 db<>fiddle.
编辑:
我误解了这个问题。你似乎想要:
select t.*,
(case when type = 'LF'
then max(case when type = 'LF' and (prev_type is null or prev_type <> type)
then date
end) over (partition by id order by date)
end) as lf_date
from (select t.*,
lag(type) over (partition by id order by date) as prev_type
from t
) t
我想我明白你需要什么,你需要为每个类型块定义一个组,然后获取每个块的最小日期:
这是一种间隙和孤岛问题。在 CTE 中,我们只为类型 LF 分配一个递增的行号,并从中减去所有行的行号,这导致连续类型的值相同。然后,这为 partition/group 提供了一种机制,以便获得每个连续类型块的最小日期。
with grouped as (
select id, date, type,
case when type='LF' then Row_Number() over (partition by id, type order by date) end -Row_Number() over (partition by id order by date) gp
from customer
)
select date, id, type,
case when type='LF' then Min(date) over(partition by gp) end New_Date
from grouped
order by date