按组计算连续的重复值
Count consecutive duplicate values by group
我在网站上搜索了一下这个问题的解决方案,但一直无法找到完全符合我正在寻找的答案。我正在尝试计算每个 ID 号的连续重复值,按日期排序。我当前的 table 类似于下面 table 的前 3 列,而我想添加的是第四列。
ID | date | value | consec_duplicates
1 1/1 1 0
1 1/2 2 0
1 1/3 2 1
1 1/4 2 2
1 1/5 3 0
1 1/6 3 1
2 1/14 1 0
2 1/15 2 0
2 1/16 3 0
2 1/17 3 1
2 1/18 4 0
2 1/19 5 0
3 1/4 1 0
3 1/5 2 0
3 1/6 2 1
3 1/7 2 2
3 1/8 2 3
3 1/9 3 0
有人知道如何构建第四列吗?谢谢!
这是一个间隙和孤岛问题。一种方法是row_number()
s的差异来识别组。
select t.*,
dense_rank() over (partition by id order by (seqnum - seqnum_value), value) as grp,
row_number() over (partition by id, (seqnum - seqnum_value), value order by date) as grp_seqnum
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, value order by date) as seqnum_v
from t
) t;
第一次看到有点难以理解。如果您 运行 子查询并盯着结果看足够长的时间,您就会明白为什么相邻值的差异是恒定的。
编辑:
我认为豪尔赫是对的。您的数据没有重复相同的值,因此您可以这样做:
select t.*,
row_number() over (partition by id, value order by date) as grp_seqnum
from t;
当值实际上一直在增加时,这应该有效:
row_number() over (partition by id, value order by date) - 1
否则,Teradata 对 Standard SQL 的情况有扩展:
row_number()
over (partition by id
order by date
RESET WHEN MIN(value) -- previous value not equal to current
OVER (partition by id
order by date
rows between 1 preceding and 1 preceding) <> value
) - 1
我在网站上搜索了一下这个问题的解决方案,但一直无法找到完全符合我正在寻找的答案。我正在尝试计算每个 ID 号的连续重复值,按日期排序。我当前的 table 类似于下面 table 的前 3 列,而我想添加的是第四列。
ID | date | value | consec_duplicates
1 1/1 1 0
1 1/2 2 0
1 1/3 2 1
1 1/4 2 2
1 1/5 3 0
1 1/6 3 1
2 1/14 1 0
2 1/15 2 0
2 1/16 3 0
2 1/17 3 1
2 1/18 4 0
2 1/19 5 0
3 1/4 1 0
3 1/5 2 0
3 1/6 2 1
3 1/7 2 2
3 1/8 2 3
3 1/9 3 0
有人知道如何构建第四列吗?谢谢!
这是一个间隙和孤岛问题。一种方法是row_number()
s的差异来识别组。
select t.*,
dense_rank() over (partition by id order by (seqnum - seqnum_value), value) as grp,
row_number() over (partition by id, (seqnum - seqnum_value), value order by date) as grp_seqnum
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, value order by date) as seqnum_v
from t
) t;
第一次看到有点难以理解。如果您 运行 子查询并盯着结果看足够长的时间,您就会明白为什么相邻值的差异是恒定的。
编辑:
我认为豪尔赫是对的。您的数据没有重复相同的值,因此您可以这样做:
select t.*,
row_number() over (partition by id, value order by date) as grp_seqnum
from t;
当值实际上一直在增加时,这应该有效:
row_number() over (partition by id, value order by date) - 1
否则,Teradata 对 Standard SQL 的情况有扩展:
row_number()
over (partition by id
order by date
RESET WHEN MIN(value) -- previous value not equal to current
OVER (partition by id
order by date
rows between 1 preceding and 1 preceding) <> value
) - 1