识别、删除重复项

Identify, delete duplicates

由于应用程序代码不正确,我不得不清理以重复结尾的数据库。

为了获得必要的数据,我加入了包含测验用户、问题和答案的表格。这给了我:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated             | MaxAnswers
--------------------------------------------------------------------------------
17     | 17         | 374526   | 65       | 2014-01-21 16:08:00.057 | 3
17     | 17         | 3497     | 61       | NULL                    | 3
17     | 17         | 3498     | 69       | NULL                    | 3
17     | 17         | 3499     | 70       | NULL                    | 3
17     | 17         | 3500     | 72       | NULL                    | 3
17     | 17         | 4071     | 62       | NULL                    | 3
17     | 17         | 4072     | 63       | NULL                    | 3
17     | 17         | 258050   | 64       | NULL                    | 3
17     | 43         | 4059     | 210      | NULL                    | 1
17     | 43         | 4060     | 210      | NULL                    | 1
17     | 110        | 533242   | 12       | NULL                    | 2
17     | 110        | 536466   | 12       | NULL                    | 2
17     | 110        | 577857   | 12       | 2015-09-24 09:13:15.127 | 2

我必须保留每个 Question 每个 User 的前 X 个答案,其中 XMaxAnswer,按 LastUpdated DESC 排序?? AnswerID DESC,然后删除其余的 - 除非 ChoiceId 出现不止一次,在这种情况下,只保留其中一个 ChoiceId。 对于给定的 QuestionIdMaxAnswer 始终相同。

我目前有上面的 select(注意:在上面的数据示例中我有 AnswerId ASC,它已经更正)但我不确定我会怎么做(我假设使用 partition?) 从那里开始。

编辑:此示例的预期输出为:

UserId | QuestionId | AnswerId | ChoiceId | LastUpdated             | MaxAnswers
--------------------------------------------------------------------------------
17     | 17         | 374526   | 65       | 2014-01-21 16:08:00.057 | 3
17     | 17         | 258050   | 64       | NULL                    | 3
17     | 17         | 4072     | 63       | NULL                    | 3
17     | 43         | 4060     | 210      | NULL                    | 1
17     | 110        | 577857   | 12       | 2015-09-24 09:13:15.127 | 2

请尝试以下代码

;with cte as (
    select
        *,
        rn = row_number() over (partition by UserId, QuestionId order by LastUpdated desc, AnswerId desc)
    from UserAnswers
)
delete UserAnswers
from UserAnswers u
inner join cte 
    on  u.UserId = cte.UserId and
        u.QuestionId = cte.QuestionId and
        u.AnswerId = cte.AnswerId
where cte.rn > cte.MaxAnswers

您还可以参考以下 SQL 教程,其中 SQL Row_Number() function is used to delete duplicate rows

这是为了测试

create table UserAnswers (
UserId int, QuestionId int,  AnswerId int,  ChoiceId int,  LastUpdated datetime, MaxAnswers int
)
insert into UserAnswers select 17     , 17         , 374526   , 65       , '2014-01-21 16:08:00.057' ,   3
insert into UserAnswers select 17     , 17         , 3497     , 61       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3498     , 69       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3499     , 70       , NULL        , 3
insert into UserAnswers select 17     , 17         , 3500     , 72       , NULL        , 3
insert into UserAnswers select 17     , 17         , 4071     , 62       , NULL        , 3
insert into UserAnswers select 17     , 17         , 4072     , 63       , NULL        , 3
insert into UserAnswers select 17     , 17         , 258050   , 64       , NULL        , 3
insert into UserAnswers select 17     , 43         , 4059     , 210      , NULL        , 1
insert into UserAnswers select 17     , 43         , 4060     , 210      , NULL        , 1
insert into UserAnswers select 17     , 110        , 533242   , 12       , '2015-09-24 09:13:15.127' ,   2