从数据库中删除多个重复行,即使某些列可能为 NULL
deleting multiple duplicate rows from a database even if some of the columns may be NULL
我继承了一个包含 table 的数据库,由于缺少唯一的主键,其中包含大量重复项。可悲的是,在添加主键之前,我需要删除除 1 之外的所有重复项。
所以我在这里找到了很多精彩的答案,并遵循了我阅读的所有建议。
这是我最终得到的查询:
WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY storyId, storyDescription, genreId, authorId, submissionDate, submittedBy, submissionUrl
ORDER BY ( SELECT 0)) RN
FROM storyList)
DELETE FROM cte
WHERE RN > 1;
确实删除了 90% 的重复条目。但是,它不会删除某些列中包含 NULL 值的行。
幸运的是,我在其他答案和评论中搜索了类似问题,但找不到任何处理潜在 NULL 值的内容。
有没有这种方法可以删除剩余的重复条目,即使它们的某些列可能包含 NULL 值?
谢谢
分别删除:
delete from storylist
where storyId is null or storyDescription is null or genreId is null or . . .
然而,这似乎很奇怪。为什么 storyid
不是候选主键?您打算使用所有列吗?
编辑:
我认为您希望将 storyid
作为主要值并在其他列中优先考虑 non-null 值。如果是:
WITH cte as (
SELECT ROW_NUMBER() OVER (PARTITION BY storyId
ORDER BY ( (CASE WHEN storyDescription IS NOT NULL THEN 1 ELSE 0 END) +
(CASE WHEN genreId IS NOT NULL THEN 1 ELSE 0 END) +
. . .
) DESC
) as seqnum
FROM storyList
)
DELETE FROM cte
WHERE seqnum > 1;
评论太长了。就这样吧。
如果我没有理解错的话,下面的代码演示了你正在尝试做什么。我还是不明白还是你可以 post 一个 minimal, reproducible example that demonstrates the issue? (Perhaps a SQLFiddle.)
-- Sample data.
declare @Samples as Table ( SampleId Int Identity, SomeString VarChar(16), SomeInt Int );
insert into @Samples ( SomeString, SomeInt ) values
( 'foo', 3 ), ( 'foo', 9 ), ( 'foo', null ), ( 'foo', 9 ), ( 'foo', null ),
( 'bar', 6 ), ( 'bar', 6 ), ( 'bar', null ), ( 'bar', 6 ), ( 'bar', null ),
( null, null ), ( null, 6 ), ( null, null ), ( null, 6 ), ( null, null );
select SampleId, SomeString, SomeInt
from @Samples
order by SampleId;
-- Get row numbers just to show they are calculated correctly.
select SampleId, SomeString, SomeInt,
Row_Number() over ( partition by SomeString, SomeInt order by SampleId ) as RN
from @Samples
order by SomeString, SomeInt, RN;
-- Delete duplicates.
with NumberedRows as (
select -- SampleId, SomeString, SomeInt,
Row_Number() over ( partition by SomeString, SomeInt order by SampleId ) as RN
from @Samples )
delete from NumberedRows
where RN > 1;
-- Display the remainder.
select SampleId, SomeString, SomeInt
from @Samples
order by SampleId;
我继承了一个包含 table 的数据库,由于缺少唯一的主键,其中包含大量重复项。可悲的是,在添加主键之前,我需要删除除 1 之外的所有重复项。
所以我在这里找到了很多精彩的答案,并遵循了我阅读的所有建议。
这是我最终得到的查询:
WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY storyId, storyDescription, genreId, authorId, submissionDate, submittedBy, submissionUrl
ORDER BY ( SELECT 0)) RN
FROM storyList)
DELETE FROM cte
WHERE RN > 1;
确实删除了 90% 的重复条目。但是,它不会删除某些列中包含 NULL 值的行。
幸运的是,我在其他答案和评论中搜索了类似问题,但找不到任何处理潜在 NULL 值的内容。
有没有这种方法可以删除剩余的重复条目,即使它们的某些列可能包含 NULL 值?
谢谢
分别删除:
delete from storylist
where storyId is null or storyDescription is null or genreId is null or . . .
然而,这似乎很奇怪。为什么 storyid
不是候选主键?您打算使用所有列吗?
编辑:
我认为您希望将 storyid
作为主要值并在其他列中优先考虑 non-null 值。如果是:
WITH cte as (
SELECT ROW_NUMBER() OVER (PARTITION BY storyId
ORDER BY ( (CASE WHEN storyDescription IS NOT NULL THEN 1 ELSE 0 END) +
(CASE WHEN genreId IS NOT NULL THEN 1 ELSE 0 END) +
. . .
) DESC
) as seqnum
FROM storyList
)
DELETE FROM cte
WHERE seqnum > 1;
评论太长了。就这样吧。
如果我没有理解错的话,下面的代码演示了你正在尝试做什么。我还是不明白还是你可以 post 一个 minimal, reproducible example that demonstrates the issue? (Perhaps a SQLFiddle.)
-- Sample data.
declare @Samples as Table ( SampleId Int Identity, SomeString VarChar(16), SomeInt Int );
insert into @Samples ( SomeString, SomeInt ) values
( 'foo', 3 ), ( 'foo', 9 ), ( 'foo', null ), ( 'foo', 9 ), ( 'foo', null ),
( 'bar', 6 ), ( 'bar', 6 ), ( 'bar', null ), ( 'bar', 6 ), ( 'bar', null ),
( null, null ), ( null, 6 ), ( null, null ), ( null, 6 ), ( null, null );
select SampleId, SomeString, SomeInt
from @Samples
order by SampleId;
-- Get row numbers just to show they are calculated correctly.
select SampleId, SomeString, SomeInt,
Row_Number() over ( partition by SomeString, SomeInt order by SampleId ) as RN
from @Samples
order by SomeString, SomeInt, RN;
-- Delete duplicates.
with NumberedRows as (
select -- SampleId, SomeString, SomeInt,
Row_Number() over ( partition by SomeString, SomeInt order by SampleId ) as RN
from @Samples )
delete from NumberedRows
where RN > 1;
-- Display the remainder.
select SampleId, SomeString, SomeInt
from @Samples
order by SampleId;