如何在没有主键或 ID 字段的 table 中查找重复项?

How find duplicates in a table with no primary key or ID field?

我继承了一个 SQL 服务器数据库,其中有重复数据。我需要查找并删除重复的行。但是没有 id 字段,我不确定如何找到行。

通常,我会使用 LEFT JOIN 将它与自身进行比较,并检查所有字段是否相同,除了 ID 字段是 table1.id <> table2.id,但如果没有它,我不知道如何找到重复的行并且不让它自己也匹配。

TABLE:

productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null

样本数据

1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"

在该示例中,只有第 1 行和第 3 行重复。

如何找到重复项?

使用 having(和分组依据)

select 
    productId 
  , categoryId 
  , state
  , dateDone
  , count(*)
from your_table 
group by productId ,categoryId ,state, dateDone
having count(*) >1

您可以尝试 CTE,然后将实际选择从 CTE 限制到 RN = 1。这是查询:-

;WITH ACTE 
AS 
(
    SELECT ProductID, categoryID, State, DateDone,
    RN = ROW_NUMBER() OVER(PARTITION BY ProductID, CategoryID, State, DateDone 
                            ORDER BY ProductID, CategoryID, State, DateDone) 
    FROM [Table] 
 ) 

SELECT * FROM ACTE WHERE RN = 1    

出于某种原因,我以为你想删除它们我想我读错了,但只需将我的语句中的 DELETE 切换为 SELECT,现在你拥有所有重复项而不是原始项。但是使用 DELETE 将删除所有重复项并仍然留下 1 条记录,我怀疑这是您的愿望。

IF OBJECT_ID('tempdb..#TT') IS NOT NULL
    BEGIN
        DROP TABLE #TT
    END

CREATE TABLE #TT (
    productId int not null,
    categoryId int not null,
    state varchar(255) not null,
    dateDone DATETIME not null
)

INSERT INTO #TT (productId, categoryId, state, dateDone)
VALUES (1, 3, 'started', '2016-06-15 04:23:12.000')
,(2, 3, 'started', '2016-06-15 04:21:12.000')
,(1, 3, 'started', '2016-06-15 04:23:12.000')
,(1, 3, 'done', '2016-06-15 04:23:12.000')


SELECT *
FROM
    #TT

;WITH cte AS (
    SELECT
       *
       ,RowNum = ROW_NUMBER() OVER (PARTITION BY productId, categoryId, state, dateDone ORDER BY productId) --note what you order by doesn't matter

    FROM

           #TT
    )

--if you want to delete them just do this otherwise change DELETE TO SELECT
    DELETE
    FROM
        cte
    WHERE
        RowNum > 1

    SELECT *
    FROM
        #TT

如果您想要并且可以更改模式,您也可以在事后添加一个标识列,它将填充现有记录

ALTER TABLE #TT
ADD Id INTEGER IDENTITY(1,1) NOT NULL

您可以使用窗口函数来完成此操作。例如

create table #tmp
   (
        Id INT
   )


insert into #tmp
VALUES (1), (1), (2) --so now we have duplicated rows



WITH CTE AS 
    (
     SELECT 
       ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) AS [DuplicateCounter], 
       Id
     FROM #tmp
    )
DELETE FROM CTE
WHERE DuplicateCounter > 1 --duplicated rows have DuplicateCounter > 1