在 SQL 中删除重复组
In SQL remove duplicate groups
在SQL服务器中,查询后得到如下数据集。我需要按 uid 对这些数据进行分组,然后删除“重复”和 return 与最新列 D 的分组。我还需要 return 2 个最近的重复数据删除组。如果满足以下条件,则组被定义为重复组:
- 行数相同
- A、B、C 列相同
- 行的顺序相同
Uid
A
B
C
D1
D2
1
6
1
2
2021-02-19
2021-02-19 09:00:00
1
6
2
1
2021-02-19
2021-02-19 10:00:00
1
6
1
2
2021-02-19
2021-02-19 11:00:00
2
6
1
2
2021-01-19
2021-01-19 09:00:00
2
6
2
1
2021-01-19
2021-01-19 10:00:00
3
6
1
2
2020-02-19
2020-02-19 09:00:00
3
6
2
1
2020-02-19
2020-02-19 10:00:00
3
6
1
2
2020-02-19
2020-02-19 11:00:00
4
11
4
5
2000-10-05
2000-10-05 09:00:00
比如上面的数据集中,uid的1和3是重复的,1是最新的。因此上面的数据集应该return
Uid
A
B
C
D1
D2
1
6
1
2
2021-02-19
2021-02-19 09:00:00
1
6
2
1
2021-02-19
2021-02-19 10:00:00
1
6
1
2
2021-02-19
2021-02-19 11:00:00
2
6
1
2
2021-01-19
2021-01-19 09:00:00
2
6
2
1
2021-01-19
2021-01-19 10:00:00
我尝试了以下 window 函数:
FROM (
SELECT
A,
B,
C,
D1,
D2,
ROW_NUMBER() over (partition by
Uid
ORDER BY D2 DESC) as rn
....
where rn = 1
但这不允许我按 uid 分组。我怎样才能做到这一点?
在下面的查询中,子查询 temp 为 A、B、C 创建一个逗号分隔的列。然后我在 A、B、C 上使用分区,并根据子查询 temp2 级别的日期获得排名。外部的最终查询仅提取排名 1 并显示 table.
的输出
select t.Uid, t.A,t.B, t.C , t.D1, t.D2
from (
select Uid, A,B,C, D1, rank() over ( partition by A, B,C order by D1 desc) as rank
from
(SELECT Uid,
A=STUFF
(
(
SELECT ', ' + CAST(A AS VARCHAR(MAX))
FROM Table1 t2
WHERE t2.Uid = t1.Uid
FOR XML PATH('')
),1,1,''
) ,
B=STUFF
(
(
SELECT ', ' + CAST(B AS VARCHAR(MAX))
FROM Table1 t2
WHERE t2.Uid = t1.Uid
FOR XML PATH('')
),1,1,''
) ,
C=STUFF
(
(
SELECT ', ' + CAST(C AS VARCHAR(MAX))
FROM Table1 t2
WHERE t2.Uid = t1.Uid
FOR XML PATH('')
),1,1,''
) ,
cast (max( [D1] ) as date) D1
FROM Table1 t1
GROUP BY Uid ) as temp
) as temp2
join Table1 t on temp2.Uid = t.Uid
and temp2. D1= t.D1
where temp2.rank = 1
这是数据库 Fiddle link : https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=de2127330c2e60d3733bfc9548504142
在SQL服务器中,查询后得到如下数据集。我需要按 uid 对这些数据进行分组,然后删除“重复”和 return 与最新列 D 的分组。我还需要 return 2 个最近的重复数据删除组。如果满足以下条件,则组被定义为重复组:
- 行数相同
- A、B、C 列相同
- 行的顺序相同
Uid | A | B | C | D1 | D2 |
---|---|---|---|---|---|
1 | 6 | 1 | 2 | 2021-02-19 | 2021-02-19 09:00:00 |
1 | 6 | 2 | 1 | 2021-02-19 | 2021-02-19 10:00:00 |
1 | 6 | 1 | 2 | 2021-02-19 | 2021-02-19 11:00:00 |
2 | 6 | 1 | 2 | 2021-01-19 | 2021-01-19 09:00:00 |
2 | 6 | 2 | 1 | 2021-01-19 | 2021-01-19 10:00:00 |
3 | 6 | 1 | 2 | 2020-02-19 | 2020-02-19 09:00:00 |
3 | 6 | 2 | 1 | 2020-02-19 | 2020-02-19 10:00:00 |
3 | 6 | 1 | 2 | 2020-02-19 | 2020-02-19 11:00:00 |
4 | 11 | 4 | 5 | 2000-10-05 | 2000-10-05 09:00:00 |
比如上面的数据集中,uid的1和3是重复的,1是最新的。因此上面的数据集应该return
Uid | A | B | C | D1 | D2 |
---|---|---|---|---|---|
1 | 6 | 1 | 2 | 2021-02-19 | 2021-02-19 09:00:00 |
1 | 6 | 2 | 1 | 2021-02-19 | 2021-02-19 10:00:00 |
1 | 6 | 1 | 2 | 2021-02-19 | 2021-02-19 11:00:00 |
2 | 6 | 1 | 2 | 2021-01-19 | 2021-01-19 09:00:00 |
2 | 6 | 2 | 1 | 2021-01-19 | 2021-01-19 10:00:00 |
我尝试了以下 window 函数:
FROM (
SELECT
A,
B,
C,
D1,
D2,
ROW_NUMBER() over (partition by
Uid
ORDER BY D2 DESC) as rn
....
where rn = 1
但这不允许我按 uid 分组。我怎样才能做到这一点?
在下面的查询中,子查询 temp 为 A、B、C 创建一个逗号分隔的列。然后我在 A、B、C 上使用分区,并根据子查询 temp2 级别的日期获得排名。外部的最终查询仅提取排名 1 并显示 table.
的输出select t.Uid, t.A,t.B, t.C , t.D1, t.D2
from (
select Uid, A,B,C, D1, rank() over ( partition by A, B,C order by D1 desc) as rank
from
(SELECT Uid,
A=STUFF
(
(
SELECT ', ' + CAST(A AS VARCHAR(MAX))
FROM Table1 t2
WHERE t2.Uid = t1.Uid
FOR XML PATH('')
),1,1,''
) ,
B=STUFF
(
(
SELECT ', ' + CAST(B AS VARCHAR(MAX))
FROM Table1 t2
WHERE t2.Uid = t1.Uid
FOR XML PATH('')
),1,1,''
) ,
C=STUFF
(
(
SELECT ', ' + CAST(C AS VARCHAR(MAX))
FROM Table1 t2
WHERE t2.Uid = t1.Uid
FOR XML PATH('')
),1,1,''
) ,
cast (max( [D1] ) as date) D1
FROM Table1 t1
GROUP BY Uid ) as temp
) as temp2
join Table1 t on temp2.Uid = t.Uid
and temp2. D1= t.D1
where temp2.rank = 1
这是数据库 Fiddle link : https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=de2127330c2e60d3733bfc9548504142