SQL：删除重复行，同时保留另一列中具有最高值的行

Question

假设我有一个 table 数据测试：

SOID SO_Name   SO_Desc     PRIORITY  ADE_PRIORITIZED  DEPLOY_DATE  ENV
123  SO1      SO1 Desc1      111      Y               01-JAN-01     0
123  SO1      SO1 Desc1      111      Y               01-JAN-01     1
123  SO1      SO1 Desc1      111      Y               01-JAN-01     2
123  SO1      SO1 Desc1      111      Y               01-JAN-01     3
987  SO1      SO1 Desc1      111      Y               01-JAN-01     0
987  SO1      SO1 Desc1      111      Y               01-JAN-16     1
987  SO1      SO1 Desc1      111      Y               21-JAN-17     2
987  SO1      SO1 Desc1      111      Y               01-JAN-17     3
121  SO121    SO121 Desc121  111      Y               01-JAN-17     0

我想删除每个 soid 的重复行（重复可以基于 4 列：so_name、so_desc、优先级、ade_prioritized）保留最高的行deploy_date.

我使用了这个查询，但它没有删除任何行。

delete from so_test a 
where a.deploy_date < (
  select max(b.deploy_date) from so_test b where a.soid = b.soid
);

0 rows deleted

我期望的最终结果应该是： SOID SO_Name SO_Desc 优先级 ADE_PRIORITIZED DEPLOY_DATE ENV 123 SO1 SO1 Desc1 111 Y 01-JAN-01 0 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2 987 SO1 SO1 Desc1 111 Y 21-JAN-17 2

可能是什么问题？没有 CTE 可以完成吗？

Answer 1

使用 with (common table expression) and row_number() 您可以识别并轻松处理重复项：

使用 ctes 时，您只能在表达式后执行一个语句（除非您正在链接 ctes 或使用多个 ctes）。

在下面的代码示例中，您将首先使用 select 检查输出，然后如果需要进一步操作，请注释掉 select 查询和 un-comment 删除查询.

rextester link：http://rextester.com/UFQQ51693

with cte as (
  select   
      *
    , rn = row_number() over (
            partition by soid 
            order by deploy_date desc
            )
    from [so_test]
)
/* --------------------------------------------------------------
-- This returns all of rows with values that have duplicates
-- along the row number (rn) so you can see which rows 
-- would be affected by the following actions
-------------------------------------------------------------- */
/*
select o.*
  from cte as o
  where exists (
      select 1
        from cte as i
        where cte.soid  = i.soid 
          and i.rn>1
      );
--*/
/* --------------------------------------------------------------
-- Remove duplicates by deleting all of the duplicates
-- where the row number (rn) is greater than 1
-- without deleting the first row of the duplicates.
-------------------------------------------------------------- */
--/*
delete 
  from cte 
  where cte.rn > 1 
--*/

rextester 删除后的结果：

+------+---------+---------------+----------+-----------------+---------------------+-----+
| soid | so_name |    so_desc    | priority | ade_prioritized |     deploy_date     | env |
+------+---------+---------------+----------+-----------------+---------------------+-----+
|  123 | SO1     | SO1_Desc1     |      111 | Y               | 01.01.2001 00:00:00 |   0 |
|  987 | SO1     | SO1_Desc1     |      111 | Y               | 21.01.2017 00:00:00 |   2 |
|  121 | SO121   | SO121_Desc121 |      111 | Y               | 01.01.2017 00:00:00 |   0 |
+------+---------+---------------+----------+-----------------+---------------------+-----+

Answer 2

基于将非重复项保存到新 table 中的示例。

create table so_test_nodups 
as
with dups as 
( select soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env,  
        row_number() over ( partition by so_name, so_desc, priority, ade_prioritized order by deploy_date desc ) rn 
  from so_test 
) 
select  soid, so_name, so_desc, priority, ade_prioritized, deploy_date, env 
from dups 
where rn=1

正在查询 so_test_nodups table。

select * from so_test_nodups

      SOID SO_NAME    SO_DESC                PRIORITY A DEPLOY_DA        ENV
---------- ---------- -------------------- ---------- - --------- ----------
       123 SO1        SO1 Desc1                   111 Y 01-JAN-01          0
       121 SO121      SO121 Desc121               111 Y 01-JAN-17          0

在提供的编辑后添加结果：

      SOID SO_NAME    SO_DESC                PRIORITY A DEPLOY_DA        ENV
---------- ---------- -------------------- ---------- - --------- ----------
       987 SO1        SO1 Desc1                   111 Y 21-JAN-17          2
       121 SO121      SO121 Desc121               111 Y 01-JAN-17          0

SQL：删除重复行，同时保留另一列中具有最高值的行

SQL: Removing Duplicates rows while retaining the row with highest value in another column

mysql

sql-server

oracle

duplicates

sql-delete