识别、删除重复项
Identify, delete duplicates
由于应用程序代码不正确,我不得不清理以重复结尾的数据库。
为了获得必要的数据,我加入了包含测验用户、问题和答案的表格。这给了我:
UserId | QuestionId | AnswerId | ChoiceId | LastUpdated | MaxAnswers
--------------------------------------------------------------------------------
17 | 17 | 374526 | 65 | 2014-01-21 16:08:00.057 | 3
17 | 17 | 3497 | 61 | NULL | 3
17 | 17 | 3498 | 69 | NULL | 3
17 | 17 | 3499 | 70 | NULL | 3
17 | 17 | 3500 | 72 | NULL | 3
17 | 17 | 4071 | 62 | NULL | 3
17 | 17 | 4072 | 63 | NULL | 3
17 | 17 | 258050 | 64 | NULL | 3
17 | 43 | 4059 | 210 | NULL | 1
17 | 43 | 4060 | 210 | NULL | 1
17 | 110 | 533242 | 12 | NULL | 2
17 | 110 | 536466 | 12 | NULL | 2
17 | 110 | 577857 | 12 | 2015-09-24 09:13:15.127 | 2
我必须保留每个 Question
每个 User
的前 X 个答案,其中 X
是 MaxAnswer
,按 LastUpdated DESC
排序?? AnswerID DESC
,然后删除其余的 - 除非 ChoiceId
出现不止一次,在这种情况下,只保留其中一个 ChoiceId
。
对于给定的 QuestionId
,MaxAnswer
始终相同。
我目前有上面的 select(注意:在上面的数据示例中我有 AnswerId ASC,它已经更正)但我不确定我会怎么做(我假设使用 partition
?) 从那里开始。
编辑:此示例的预期输出为:
UserId | QuestionId | AnswerId | ChoiceId | LastUpdated | MaxAnswers
--------------------------------------------------------------------------------
17 | 17 | 374526 | 65 | 2014-01-21 16:08:00.057 | 3
17 | 17 | 258050 | 64 | NULL | 3
17 | 17 | 4072 | 63 | NULL | 3
17 | 43 | 4060 | 210 | NULL | 1
17 | 110 | 577857 | 12 | 2015-09-24 09:13:15.127 | 2
请尝试以下代码
;with cte as (
select
*,
rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc)
from UserAnswers
)
delete UserAnswers
from UserAnswers u
inner join cte
on u.UserId = cte.UserId and
u.QuestionId = cte.QuestionId and
u.AnswerId = cte.AnswerId
where cte.rn > cte.MaxAnswers
您还可以参考以下 SQL 教程,其中 SQL Row_Number() function is used to delete duplicate rows
这是为了测试
create table UserAnswers (
UserId int, QuestionId int, AnswerId int, ChoiceId int, LastUpdated datetime, MaxAnswers int
)
insert into UserAnswers select 17 , 17 , 374526 , 65 , '2014-01-21 16:08:00.057' , 3
insert into UserAnswers select 17 , 17 , 3497 , 61 , NULL , 3
insert into UserAnswers select 17 , 17 , 3498 , 69 , NULL , 3
insert into UserAnswers select 17 , 17 , 3499 , 70 , NULL , 3
insert into UserAnswers select 17 , 17 , 3500 , 72 , NULL , 3
insert into UserAnswers select 17 , 17 , 4071 , 62 , NULL , 3
insert into UserAnswers select 17 , 17 , 4072 , 63 , NULL , 3
insert into UserAnswers select 17 , 17 , 258050 , 64 , NULL , 3
insert into UserAnswers select 17 , 43 , 4059 , 210 , NULL , 1
insert into UserAnswers select 17 , 43 , 4060 , 210 , NULL , 1
insert into UserAnswers select 17 , 110 , 533242 , 12 , '2015-09-24 09:13:15.127' , 2
由于应用程序代码不正确,我不得不清理以重复结尾的数据库。
为了获得必要的数据,我加入了包含测验用户、问题和答案的表格。这给了我:
UserId | QuestionId | AnswerId | ChoiceId | LastUpdated | MaxAnswers
--------------------------------------------------------------------------------
17 | 17 | 374526 | 65 | 2014-01-21 16:08:00.057 | 3
17 | 17 | 3497 | 61 | NULL | 3
17 | 17 | 3498 | 69 | NULL | 3
17 | 17 | 3499 | 70 | NULL | 3
17 | 17 | 3500 | 72 | NULL | 3
17 | 17 | 4071 | 62 | NULL | 3
17 | 17 | 4072 | 63 | NULL | 3
17 | 17 | 258050 | 64 | NULL | 3
17 | 43 | 4059 | 210 | NULL | 1
17 | 43 | 4060 | 210 | NULL | 1
17 | 110 | 533242 | 12 | NULL | 2
17 | 110 | 536466 | 12 | NULL | 2
17 | 110 | 577857 | 12 | 2015-09-24 09:13:15.127 | 2
我必须保留每个 Question
每个 User
的前 X 个答案,其中 X
是 MaxAnswer
,按 LastUpdated DESC
排序?? AnswerID DESC
,然后删除其余的 - 除非 ChoiceId
出现不止一次,在这种情况下,只保留其中一个 ChoiceId
。
对于给定的 QuestionId
,MaxAnswer
始终相同。
我目前有上面的 select(注意:在上面的数据示例中我有 AnswerId ASC,它已经更正)但我不确定我会怎么做(我假设使用 partition
?) 从那里开始。
编辑:此示例的预期输出为:
UserId | QuestionId | AnswerId | ChoiceId | LastUpdated | MaxAnswers
--------------------------------------------------------------------------------
17 | 17 | 374526 | 65 | 2014-01-21 16:08:00.057 | 3
17 | 17 | 258050 | 64 | NULL | 3
17 | 17 | 4072 | 63 | NULL | 3
17 | 43 | 4060 | 210 | NULL | 1
17 | 110 | 577857 | 12 | 2015-09-24 09:13:15.127 | 2
请尝试以下代码
;with cte as (
select
*,
rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc)
from UserAnswers
)
delete UserAnswers
from UserAnswers u
inner join cte
on u.UserId = cte.UserId and
u.QuestionId = cte.QuestionId and
u.AnswerId = cte.AnswerId
where cte.rn > cte.MaxAnswers
您还可以参考以下 SQL 教程,其中 SQL Row_Number() function is used to delete duplicate rows
这是为了测试
create table UserAnswers (
UserId int, QuestionId int, AnswerId int, ChoiceId int, LastUpdated datetime, MaxAnswers int
)
insert into UserAnswers select 17 , 17 , 374526 , 65 , '2014-01-21 16:08:00.057' , 3
insert into UserAnswers select 17 , 17 , 3497 , 61 , NULL , 3
insert into UserAnswers select 17 , 17 , 3498 , 69 , NULL , 3
insert into UserAnswers select 17 , 17 , 3499 , 70 , NULL , 3
insert into UserAnswers select 17 , 17 , 3500 , 72 , NULL , 3
insert into UserAnswers select 17 , 17 , 4071 , 62 , NULL , 3
insert into UserAnswers select 17 , 17 , 4072 , 63 , NULL , 3
insert into UserAnswers select 17 , 17 , 258050 , 64 , NULL , 3
insert into UserAnswers select 17 , 43 , 4059 , 210 , NULL , 1
insert into UserAnswers select 17 , 43 , 4060 , 210 , NULL , 1
insert into UserAnswers select 17 , 110 , 533242 , 12 , '2015-09-24 09:13:15.127' , 2