添加一列以填充每个组的排名
Add a column to populate rank for every group
我有帐户详细信息的历史数据,其中帐户 activity 状态为 'Active' 或 'Cancelled'。重新打开帐户后,帐户状态变为 'Active'以后可以变成'cancelled'如下数据。现在我想在每次重新打开帐户时区分数据(account_sub_number)。
我使用了以下查询:
select status,status_code,account_number,date,
row_number() over (partition by account_number,status_code order by
date ) as Account_Sub_Number
from schema.account where account_number= 1234
order by date
Source_data:
Account Number Status Status Code Date
1234 Active A 2017-12-04
1234 Active A 2017-12-05
1234 Active A 2017-12-06
1235 Active A 2017-12-07
1234 Active A 2018-03-02
1234 Cancelled C 2018-03-03
1234 Cancelled C 2018-03-04
1234 Cancelled C 2018-05-10
1234 Cancelled C 2018-05-11
1234 Active A 2018-05-24
1234 Active A 2018-05-25
1234 Active A 2018-05-26
1234 Active A 2018-05-27
1234 Cancelled C 2018-05-28
1234 Cancelled C 2018-06-15
1234 Cancelled C 2018-06-16
1234 Cancelled C 2018-06-17
所需输出:
Account Number Status Status Code Date Account Sub Number
1234 Active A 2017-12-04 1
1234 Active A 2017-12-05 1
1234 Active A 2017-12-06 1
1235 Active A 2017-12-07 1
1234 Active A 2018-03-02 1
1234 Cancelled C 2018-03-03 1
1234 Cancelled C 2018-03-04 1
1234 Cancelled C 2018-05-10 1
1234 Cancelled C 2018-05-11 1
1234 Active A 2018-05-24 2
1234 Active A 2018-05-25 2
1234 Active A 2018-05-26 2
1234 Active A 2018-05-27 2
1234 Cancelled C 2018-05-28 2
1234 Cancelled C 2018-06-15 2
1234 Cancelled C 2018-06-16 2
1234 Cancelled C 2018-06-17 2
我的查询结果:
Account Number Status Status Code Date Account_sub_number
1234 Active A 2017-12-04 1
1234 Active A 2017-12-05 2
1234 Active A 2017-12-06 3
1235 Active A 2017-12-07 4
1234 Active A 2018-03-02 5
1234 Active A 2018-05-24 6
1234 Active A 2018-05-25 7
1234 Active A 2018-05-26 8
1234 Active A 2018-05-27 9
1234 Cancelled C 2018-03-03 1
1234 Cancelled C 2018-03-04 2
1234 Cancelled C 2018-05-10 3
1234 Cancelled C 2018-05-11 4
1234 Cancelled C 2018-05-28 5
1234 Cancelled C 2018-06-15 6
1234 Cancelled C 2018-06-16 7
1234 Cancelled C 2018-06-17 8
使用 lag
获取上一行的状态(按日期排序的每个帐户)并将其用于比较以设置总和为 运行 的组。
select t.*
,sum(case when prev_status is null or (prev_status='Cancelled' and status='Active') then 1 else 0 end)
over(partition by account_number order by date) as sub_account_number
from (select status,status_code,account_number,date,
lag(status) over (partition by account_number order by date) as prev_status
from schema.account
where account_number= 1234
) a
基本上,您需要定义组。在这种情况下,您可以通过查看非活动状态之后的活动状态来标记组的开始位置。
那么,组首的累计和就是你要找的子号:
select a.*,
sum(case when prev_status_code = status_code or
status <> 'Active'
then 0 else 1
end) over (partition by account_number order by date range between unbounded preceding and current row) as account_subnumber
from (select a.*,
lag(status_code) over (partition by account_number order by date) as prev_status_code
from schema.account a
) a
where account_number = 1234
order by date;
我有帐户详细信息的历史数据,其中帐户 activity 状态为 'Active' 或 'Cancelled'。重新打开帐户后,帐户状态变为 'Active'以后可以变成'cancelled'如下数据。现在我想在每次重新打开帐户时区分数据(account_sub_number)。
我使用了以下查询:
select status,status_code,account_number,date,
row_number() over (partition by account_number,status_code order by
date ) as Account_Sub_Number
from schema.account where account_number= 1234
order by date
Source_data:
Account Number Status Status Code Date
1234 Active A 2017-12-04
1234 Active A 2017-12-05
1234 Active A 2017-12-06
1235 Active A 2017-12-07
1234 Active A 2018-03-02
1234 Cancelled C 2018-03-03
1234 Cancelled C 2018-03-04
1234 Cancelled C 2018-05-10
1234 Cancelled C 2018-05-11
1234 Active A 2018-05-24
1234 Active A 2018-05-25
1234 Active A 2018-05-26
1234 Active A 2018-05-27
1234 Cancelled C 2018-05-28
1234 Cancelled C 2018-06-15
1234 Cancelled C 2018-06-16
1234 Cancelled C 2018-06-17
所需输出:
Account Number Status Status Code Date Account Sub Number
1234 Active A 2017-12-04 1
1234 Active A 2017-12-05 1
1234 Active A 2017-12-06 1
1235 Active A 2017-12-07 1
1234 Active A 2018-03-02 1
1234 Cancelled C 2018-03-03 1
1234 Cancelled C 2018-03-04 1
1234 Cancelled C 2018-05-10 1
1234 Cancelled C 2018-05-11 1
1234 Active A 2018-05-24 2
1234 Active A 2018-05-25 2
1234 Active A 2018-05-26 2
1234 Active A 2018-05-27 2
1234 Cancelled C 2018-05-28 2
1234 Cancelled C 2018-06-15 2
1234 Cancelled C 2018-06-16 2
1234 Cancelled C 2018-06-17 2
我的查询结果:
Account Number Status Status Code Date Account_sub_number
1234 Active A 2017-12-04 1
1234 Active A 2017-12-05 2
1234 Active A 2017-12-06 3
1235 Active A 2017-12-07 4
1234 Active A 2018-03-02 5
1234 Active A 2018-05-24 6
1234 Active A 2018-05-25 7
1234 Active A 2018-05-26 8
1234 Active A 2018-05-27 9
1234 Cancelled C 2018-03-03 1
1234 Cancelled C 2018-03-04 2
1234 Cancelled C 2018-05-10 3
1234 Cancelled C 2018-05-11 4
1234 Cancelled C 2018-05-28 5
1234 Cancelled C 2018-06-15 6
1234 Cancelled C 2018-06-16 7
1234 Cancelled C 2018-06-17 8
使用 lag
获取上一行的状态(按日期排序的每个帐户)并将其用于比较以设置总和为 运行 的组。
select t.*
,sum(case when prev_status is null or (prev_status='Cancelled' and status='Active') then 1 else 0 end)
over(partition by account_number order by date) as sub_account_number
from (select status,status_code,account_number,date,
lag(status) over (partition by account_number order by date) as prev_status
from schema.account
where account_number= 1234
) a
基本上,您需要定义组。在这种情况下,您可以通过查看非活动状态之后的活动状态来标记组的开始位置。
那么,组首的累计和就是你要找的子号:
select a.*,
sum(case when prev_status_code = status_code or
status <> 'Active'
then 0 else 1
end) over (partition by account_number order by date range between unbounded preceding and current row) as account_subnumber
from (select a.*,
lag(status_code) over (partition by account_number order by date) as prev_status_code
from schema.account a
) a
where account_number = 1234
order by date;