从数据库中删除多个重复行，即使某些列可能为 NULL

Question

我继承了一个包含 table 的数据库，由于缺少唯一的主键，其中包含大量重复项。可悲的是，在添加主键之前，我需要删除除 1 之外的所有重复项。

所以我在这里找到了很多精彩的答案，并遵循了我阅读的所有建议。

这是我最终得到的查询：

WITH cte
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY storyId, storyDescription, genreId, authorId, submissionDate, submittedBy, submissionUrl 
                                       ORDER BY ( SELECT 0)) RN
         FROM   storyList)
DELETE FROM cte
WHERE  RN > 1;

确实删除了 90% 的重复条目。但是，它不会删除某些列中包含 NULL 值的行。

幸运的是，我在其他答案和评论中搜索了类似问题，但找不到任何处理潜在 NULL 值的内容。

有没有这种方法可以删除剩余的重复条目，即使它们的某些列可能包含 NULL 值？

谢谢

Answer 1

分别删除：

delete from storylist
    where storyId is null or storyDescription is null  or genreId is null or . . .

然而，这似乎很奇怪。为什么 storyid 不是候选主键？您打算使用所有列吗？

编辑：

我认为您希望将 storyid 作为主要值并在其他列中优先考虑 non-null 值。如果是：

WITH cte as (
      SELECT ROW_NUMBER() OVER (PARTITION BY storyId 
                                    ORDER BY ( (CASE WHEN storyDescription IS NOT NULL THEN 1 ELSE 0 END) +
                                               (CASE WHEN genreId IS NOT NULL THEN 1 ELSE 0 END) +
                                               . . .
                                             ) DESC
                          ) as seqnum
      FROM storyList
     )
DELETE FROM cte
WHERE seqnum > 1;

Answer 2

评论太长了。就这样吧。

如果我没有理解错的话，下面的代码演示了你正在尝试做什么。我还是不明白还是你可以 post 一个 minimal, reproducible example that demonstrates the issue? (Perhaps a SQLFiddle.)

-- Sample data.
declare @Samples as Table ( SampleId Int Identity, SomeString VarChar(16), SomeInt Int );
insert into @Samples ( SomeString, SomeInt ) values
  ( 'foo', 3 ), ( 'foo', 9 ), ( 'foo', null ), ( 'foo', 9 ), ( 'foo', null ),
  ( 'bar', 6 ), ( 'bar', 6 ), ( 'bar', null ), ( 'bar', 6 ), ( 'bar', null ),
  ( null, null ), ( null, 6 ), ( null, null ), ( null, 6 ), ( null, null );
select SampleId, SomeString, SomeInt
  from @Samples
  order by SampleId;

-- Get row numbers just to show they are calculated correctly.
select SampleId, SomeString, SomeInt,
  Row_Number() over ( partition by SomeString, SomeInt order by SampleId ) as RN
  from @Samples
  order by SomeString, SomeInt, RN;

-- Delete duplicates.
with NumberedRows as (
  select -- SampleId, SomeString, SomeInt,
    Row_Number() over ( partition by SomeString, SomeInt order by SampleId ) as RN
    from @Samples )
  delete from NumberedRows
    where RN > 1;
  
-- Display the remainder.
select SampleId, SomeString, SomeInt
  from @Samples
  order by SampleId;

从数据库中删除多个重复行，即使某些列可能为 NULL

deleting multiple duplicate rows from a database even if some of the columns may be NULL

sql

tsql

sql-server-2008

sql-server-2012