SQL:删除重复行,同时保留另一列中具有最高值的行
SQL: Removing Duplicates rows while retaining the row with highest value in another column
假设我有一个 table 数据测试:
SOID SO_Name SO_Desc PRIORITY ADE_PRIORITIZED DEPLOY_DATE ENV
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
123 SO1 SO1 Desc1 111 Y 01-JAN-01 1
123 SO1 SO1 Desc1 111 Y 01-JAN-01 2
123 SO1 SO1 Desc1 111 Y 01-JAN-01 3
987 SO1 SO1 Desc1 111 Y 01-JAN-01 0
987 SO1 SO1 Desc1 111 Y 01-JAN-16 1
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
987 SO1 SO1 Desc1 111 Y 01-JAN-17 3
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
我想删除每个 soid 的重复行(重复可以基于 4 列:so_name、so_desc、优先级、ade_prioritized)保留最高的行deploy_date.
我使用了这个查询,但它没有删除任何行。
delete from so_test a
where a.deploy_date < (
select max(b.deploy_date) from so_test b where a.soid = b.soid
);
0 rows deleted
我期望的最终结果应该是:
SOID SO_Name SO_Desc 优先级 ADE_PRIORITIZED DEPLOY_DATE ENV
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
可能是什么问题?
没有 CTE 可以完成吗?
使用 with (common table expression)
and row_number()
您可以识别并轻松处理重复项:
使用 ctes 时,您只能在表达式后执行一个语句(除非您正在链接 ctes 或使用多个 ctes)。
在下面的代码示例中,您将首先使用 select 检查输出,然后如果需要进一步操作,请注释掉 select 查询和 un-comment 删除查询.
rextester link:http://rextester.com/UFQQ51693
with cte as (
select
*
, rn = row_number() over (
partition by soid
order by deploy_date desc
)
from [so_test]
)
/* --------------------------------------------------------------
-- This returns all of rows with values that have duplicates
-- along the row number (rn) so you can see which rows
-- would be affected by the following actions
-------------------------------------------------------------- */
/*
select o.*
from cte as o
where exists (
select 1
from cte as i
where cte.soid = i.soid
and i.rn>1
);
--*/
/* --------------------------------------------------------------
-- Remove duplicates by deleting all of the duplicates
-- where the row number (rn) is greater than 1
-- without deleting the first row of the duplicates.
-------------------------------------------------------------- */
--/*
delete
from cte
where cte.rn > 1
--*/
rextester 删除后的结果:
+------+---------+---------------+----------+-----------------+---------------------+-----+
| soid | so_name | so_desc | priority | ade_prioritized | deploy_date | env |
+------+---------+---------------+----------+-----------------+---------------------+-----+
| 123 | SO1 | SO1_Desc1 | 111 | Y | 01.01.2001 00:00:00 | 0 |
| 987 | SO1 | SO1_Desc1 | 111 | Y | 21.01.2017 00:00:00 | 2 |
| 121 | SO121 | SO121_Desc121 | 111 | Y | 01.01.2017 00:00:00 | 0 |
+------+---------+---------------+----------+-----------------+---------------------+-----+
基于将非重复项保存到新 table 中的示例。
create table so_test_nodups
as
with dups as
( select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env,
row_number() over ( partition by so_name, so_desc, priority, ade_prioritized order by deploy_date desc ) rn
from so_test
)
select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env
from dups
where rn=1
正在查询 so_test_nodups table。
select * from so_test_nodups
SOID SO_NAME SO_DESC PRIORITY A DEPLOY_DA ENV
---------- ---------- -------------------- ---------- - --------- ----------
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
在提供的编辑后添加结果:
SOID SO_NAME SO_DESC PRIORITY A DEPLOY_DA ENV
---------- ---------- -------------------- ---------- - --------- ----------
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
假设我有一个 table 数据测试:
SOID SO_Name SO_Desc PRIORITY ADE_PRIORITIZED DEPLOY_DATE ENV
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
123 SO1 SO1 Desc1 111 Y 01-JAN-01 1
123 SO1 SO1 Desc1 111 Y 01-JAN-01 2
123 SO1 SO1 Desc1 111 Y 01-JAN-01 3
987 SO1 SO1 Desc1 111 Y 01-JAN-01 0
987 SO1 SO1 Desc1 111 Y 01-JAN-16 1
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
987 SO1 SO1 Desc1 111 Y 01-JAN-17 3
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
我想删除每个 soid 的重复行(重复可以基于 4 列:so_name、so_desc、优先级、ade_prioritized)保留最高的行deploy_date.
我使用了这个查询,但它没有删除任何行。
delete from so_test a
where a.deploy_date < (
select max(b.deploy_date) from so_test b where a.soid = b.soid
);
0 rows deleted
我期望的最终结果应该是: SOID SO_Name SO_Desc 优先级 ADE_PRIORITIZED DEPLOY_DATE ENV 123 SO1 SO1 Desc1 111 Y 01-JAN-01 0 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
可能是什么问题? 没有 CTE 可以完成吗?
使用 with (common table expression)
and row_number()
您可以识别并轻松处理重复项:
使用 ctes 时,您只能在表达式后执行一个语句(除非您正在链接 ctes 或使用多个 ctes)。
在下面的代码示例中,您将首先使用 select 检查输出,然后如果需要进一步操作,请注释掉 select 查询和 un-comment 删除查询.
rextester link:http://rextester.com/UFQQ51693
with cte as (
select
*
, rn = row_number() over (
partition by soid
order by deploy_date desc
)
from [so_test]
)
/* --------------------------------------------------------------
-- This returns all of rows with values that have duplicates
-- along the row number (rn) so you can see which rows
-- would be affected by the following actions
-------------------------------------------------------------- */
/*
select o.*
from cte as o
where exists (
select 1
from cte as i
where cte.soid = i.soid
and i.rn>1
);
--*/
/* --------------------------------------------------------------
-- Remove duplicates by deleting all of the duplicates
-- where the row number (rn) is greater than 1
-- without deleting the first row of the duplicates.
-------------------------------------------------------------- */
--/*
delete
from cte
where cte.rn > 1
--*/
rextester 删除后的结果:
+------+---------+---------------+----------+-----------------+---------------------+-----+
| soid | so_name | so_desc | priority | ade_prioritized | deploy_date | env |
+------+---------+---------------+----------+-----------------+---------------------+-----+
| 123 | SO1 | SO1_Desc1 | 111 | Y | 01.01.2001 00:00:00 | 0 |
| 987 | SO1 | SO1_Desc1 | 111 | Y | 21.01.2017 00:00:00 | 2 |
| 121 | SO121 | SO121_Desc121 | 111 | Y | 01.01.2017 00:00:00 | 0 |
+------+---------+---------------+----------+-----------------+---------------------+-----+
基于将非重复项保存到新 table 中的示例。
create table so_test_nodups
as
with dups as
( select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env,
row_number() over ( partition by so_name, so_desc, priority, ade_prioritized order by deploy_date desc ) rn
from so_test
)
select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env
from dups
where rn=1
正在查询 so_test_nodups table。
select * from so_test_nodups
SOID SO_NAME SO_DESC PRIORITY A DEPLOY_DA ENV
---------- ---------- -------------------- ---------- - --------- ----------
123 SO1 SO1 Desc1 111 Y 01-JAN-01 0
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0
在提供的编辑后添加结果:
SOID SO_NAME SO_DESC PRIORITY A DEPLOY_DA ENV
---------- ---------- -------------------- ---------- - --------- ----------
987 SO1 SO1 Desc1 111 Y 21-JAN-17 2
121 SO121 SO121 Desc121 111 Y 01-JAN-17 0