如何在没有主键或 ID 字段的 table 中查找重复项?
How find duplicates in a table with no primary key or ID field?
我继承了一个 SQL 服务器数据库,其中有重复数据。我需要查找并删除重复的行。但是没有 id 字段,我不确定如何找到行。
通常,我会使用 LEFT JOIN
将它与自身进行比较,并检查所有字段是否相同,除了 ID 字段是 table1.id <> table2.id
,但如果没有它,我不知道如何找到重复的行并且不让它自己也匹配。
TABLE:
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
样本数据
1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"
在该示例中,只有第 1 行和第 3 行重复。
如何找到重复项?
使用 having(和分组依据)
select
productId
, categoryId
, state
, dateDone
, count(*)
from your_table
group by productId ,categoryId ,state, dateDone
having count(*) >1
您可以尝试 CTE
,然后将实际选择从 CTE
限制到 RN = 1
。这是查询:-
;WITH ACTE
AS
(
SELECT ProductID, categoryID, State, DateDone,
RN = ROW_NUMBER() OVER(PARTITION BY ProductID, CategoryID, State, DateDone
ORDER BY ProductID, CategoryID, State, DateDone)
FROM [Table]
)
SELECT * FROM ACTE WHERE RN = 1
出于某种原因,我以为你想删除它们我想我读错了,但只需将我的语句中的 DELETE 切换为 SELECT,现在你拥有所有重复项而不是原始项。但是使用 DELETE 将删除所有重复项并仍然留下 1 条记录,我怀疑这是您的愿望。
IF OBJECT_ID('tempdb..#TT') IS NOT NULL
BEGIN
DROP TABLE #TT
END
CREATE TABLE #TT (
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
)
INSERT INTO #TT (productId, categoryId, state, dateDone)
VALUES (1, 3, 'started', '2016-06-15 04:23:12.000')
,(2, 3, 'started', '2016-06-15 04:21:12.000')
,(1, 3, 'started', '2016-06-15 04:23:12.000')
,(1, 3, 'done', '2016-06-15 04:23:12.000')
SELECT *
FROM
#TT
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY productId, categoryId, state, dateDone ORDER BY productId) --note what you order by doesn't matter
FROM
#TT
)
--if you want to delete them just do this otherwise change DELETE TO SELECT
DELETE
FROM
cte
WHERE
RowNum > 1
SELECT *
FROM
#TT
如果您想要并且可以更改模式,您也可以在事后添加一个标识列,它将填充现有记录
ALTER TABLE #TT
ADD Id INTEGER IDENTITY(1,1) NOT NULL
您可以使用窗口函数来完成此操作。例如
create table #tmp
(
Id INT
)
insert into #tmp
VALUES (1), (1), (2) --so now we have duplicated rows
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) AS [DuplicateCounter],
Id
FROM #tmp
)
DELETE FROM CTE
WHERE DuplicateCounter > 1 --duplicated rows have DuplicateCounter > 1
我继承了一个 SQL 服务器数据库,其中有重复数据。我需要查找并删除重复的行。但是没有 id 字段,我不确定如何找到行。
通常,我会使用 LEFT JOIN
将它与自身进行比较,并检查所有字段是否相同,除了 ID 字段是 table1.id <> table2.id
,但如果没有它,我不知道如何找到重复的行并且不让它自己也匹配。
TABLE:
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
样本数据
1, 3, "started", "2016-06-15 04:23:12.000"
2, 3, "started", "2016-06-15 04:21:12.000"
1, 3, "started", "2016-06-15 04:23:12.000"
1, 3, "done", "2016-06-15 04:23:12.000"
在该示例中,只有第 1 行和第 3 行重复。
如何找到重复项?
使用 having(和分组依据)
select
productId
, categoryId
, state
, dateDone
, count(*)
from your_table
group by productId ,categoryId ,state, dateDone
having count(*) >1
您可以尝试 CTE
,然后将实际选择从 CTE
限制到 RN = 1
。这是查询:-
;WITH ACTE
AS
(
SELECT ProductID, categoryID, State, DateDone,
RN = ROW_NUMBER() OVER(PARTITION BY ProductID, CategoryID, State, DateDone
ORDER BY ProductID, CategoryID, State, DateDone)
FROM [Table]
)
SELECT * FROM ACTE WHERE RN = 1
出于某种原因,我以为你想删除它们我想我读错了,但只需将我的语句中的 DELETE 切换为 SELECT,现在你拥有所有重复项而不是原始项。但是使用 DELETE 将删除所有重复项并仍然留下 1 条记录,我怀疑这是您的愿望。
IF OBJECT_ID('tempdb..#TT') IS NOT NULL
BEGIN
DROP TABLE #TT
END
CREATE TABLE #TT (
productId int not null,
categoryId int not null,
state varchar(255) not null,
dateDone DATETIME not null
)
INSERT INTO #TT (productId, categoryId, state, dateDone)
VALUES (1, 3, 'started', '2016-06-15 04:23:12.000')
,(2, 3, 'started', '2016-06-15 04:21:12.000')
,(1, 3, 'started', '2016-06-15 04:23:12.000')
,(1, 3, 'done', '2016-06-15 04:23:12.000')
SELECT *
FROM
#TT
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY productId, categoryId, state, dateDone ORDER BY productId) --note what you order by doesn't matter
FROM
#TT
)
--if you want to delete them just do this otherwise change DELETE TO SELECT
DELETE
FROM
cte
WHERE
RowNum > 1
SELECT *
FROM
#TT
如果您想要并且可以更改模式,您也可以在事后添加一个标识列,它将填充现有记录
ALTER TABLE #TT
ADD Id INTEGER IDENTITY(1,1) NOT NULL
您可以使用窗口函数来完成此操作。例如
create table #tmp
(
Id INT
)
insert into #tmp
VALUES (1), (1), (2) --so now we have duplicated rows
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Id) AS [DuplicateCounter],
Id
FROM #tmp
)
DELETE FROM CTE
WHERE DuplicateCounter > 1 --duplicated rows have DuplicateCounter > 1