从 Microsoft SQL Server 2012 table 中删除部分重复行（仅两列中的重复数据）

Question

我有一个 table 大约 7000 行，下面是其中的一小部分示例：

ORG_ID     Entity_CLASS_ID     Entity_ID     ORGANISATION_ID     ORGANISATION_ROLE_ID    COLUMN_X
-------------------------------------------------------------------------------------------------
781        3                   998           896                 4                       1          
2098       3                   998           3191                4                       66   
3808       4                   998           3191                4                       6555     
780        3                   997           2402                4                       34234     
3807       4                   997           2061                4                       234    
2097       3                   997           2061                4                       6756

您会注意到每个 Entity_ID 都有多行。对于每个 Entity_ID，有两个或更多不同的 Entity_CLASS_ID。

您可以看到，在某些情况下，每个 Entity_ID 的两行也与 ORGANISATION_ID 匹配，但具有不同的 Entity_Class_IDs:

ORG_ID     Entity_CLASS_ID     Entity_ID     ORGANISATION_ID     ORGANISATION_ROLE_ID    COLUMN_X     
-------------------------------------------------------------------------------------------------
2098       3                   998           3191                4                       66     
3808       4                   998           3191                4                       6555

在这种情况下，我想删除实体 class ID = 3 的行。

整理后的结果 table 应该是：

ORG_ID     Entity_CLASS_ID     Entity_ID     ORGANISATION_ID     ORGANISATION_ROLE_ID    COLUMN_X
-------------------------------------------------------------------------------------------------
781        3                   998           896                 4                       1          
3808       4                   998           3191                4                       6555     
780        3                   997           2402                4                       34234     
3807       4                   997           2061                4                       234

希望我已经解释清楚了！？

我尝试自己用代码解决这个问题，但部分匹配的混合使我无法接近解决方案！

在此先感谢您对此的任何帮助。

Answer 1

我们可以尝试为此目的使用 CTE：

WITH cte AS (
    SELECT *, COUNT(*) OVER (PARTITION BY Entity_ID, ORGANISATION_ID) cnt,
        MAX(Entity_CLASS_ID) OVER (PARTITION BY Entity_ID, ORGANISATION_ID) max_ecid
    FROM yourTable
)

DELETE
FROM cte
WHERE
    Entity_CLASS_ID = 3 AND   -- identifies the duplicate
    cnt = 2 AND               -- must occur in a pair
    max_ecid = 4;             -- the other record must be 4

这里是一个运行演示，显示正在识别要删除的正确记录：

Demo

Answer 2

您也可以使用 exists:

delete from t
where t.Entity_CLASS_ID = 3 and
      exists (select 1
              from t t2
              where t2.Entity_ID = t.Entity_ID and
                    t2.ORGANISATION_ID = t.ORGANISATION_ID and
                    t2.Entity_CLASS_ID <> 3
             );

这看起来几乎是对问题措辞的直接翻译。

从 Microsoft SQL Server 2012 table 中删除部分重复行（仅两列中的重复数据）

Removing partial duplicate rows (duplicate data in only two columns) from Microsoft SQL Server 2012 table

sql

sql-server-2012

Demo