不使用游标删除重复行和依赖项

removing duplicate rows and dependencies without cursor

我有一个 table,其中有一长串重复项。我正在处理一个存储过程以将它们全部合并到一个记录中。每个重复的项目都有许多 child table 应该被删除,或者重新键入以指向结果记录。我的 table 有一个 Id,但 ReadableIdentifier 是我需要去重的列。

Id | ReadableIdentifier | Name        | UpdatedOn
1  | ABC1234            | Product X   | 2014-04-25 16:00:08.000
2  | ABC1234            | Product X   | 2014-04-28 16:00:08.000
3  | ABC1234            | Product X   | 2014-04-21 16:00:08.000
4  | ABDD9945           | Widget R    | 2014-04-25 16:00:08.000
5  | ABDD9945           | Widget R    | 2014-04-25 18:45:08.000

如您所见,记录 1-3 是具有不同 ID 和更新日期的重复记录。 4-5 也一样。我需要将这些合并到一个记录中,更喜欢具有最新 UpdatedOn 日期的那个。

最终目标(未显示 children tables):

Id | ReadableIdentifier | Name        | UpdatedOn
2  | ABC1234            | Product X   | 2014-04-28 16:00:08.000
5  | ABDD9945           | Widget R    | 2014-04-25 18:45:08.000

我正在使用 CURSOR 来执行此操作,但我想知道是否有更好的解决方案。

DECLARE dupeCursor CURSOR 
    FAST_FORWARD
FOR 
    WITH Counts AS (
        SELECT
            COUNT(1) Count,
            ReadableIdentifier
        FROM dbo.Item WITH (NOLOCK)
        WHERE ReadableIdentifier IS NOT NULL
        GROUP BY ReadableIdentifier)
    SELECT
        Counts.Count, 
        Counts.ReadableIdentifier,
        Counts.CompanyId
    FROM
        Counts
    WHERE Counts.Count > 1;

OPEN dupeCursor;
DECLARE @readableId VARCHAR(50);
DECLARE @itemToPersistId INT, @itemToDeleteId INT;
FETCH NEXT FROM dupeCursor INTO @readableId;

WHILE @@FETCH_STATUS = 0
BEGIN

   WITH V AS (
      SELECT Id, ROW_NUMBER() OVER (PARTITION BY ReadableId ORDER BY UpdatedOn DESC) as Row
        FROM dbo.Item WITH (NOLOCK) WHERE ReadableId = @readableId
   )
   SELECT @itemToPersistId = Id
   FROM V
   WHERE V.Row = 1

   CREATE TABLE #itemsToDelete (Id UNIQUEIDENTIFIER)
   INSERT INTO #itemsToDelete
   SELECT Id
   FROM dbo.Item WITH (NOLOCK)
   WHERE ReadableId = @readableId AND Id != @itemToPersistId;

   --UPDATE CHILDREN TABLES
   DELETE FROM dbo.ItemDetails WHERE ItemId IN (SELECT Id FROM #itemsToDelete);

   UPDATE dbo.ItemPurchases SET ItemId = @itemToPersistId 
   WHERE ItemId IN (SELECT Id FROM #itemsToDelete); 

   UPDATE dbo.PurchaseOrders SET ItemId = @itemToPersistId 
   WHERE ItemId IN (SELECT Id FROM #itemsToDelete); 

   DELETE FROM dbo.ItemMetadata WHERE ItemId IN (SELECT Id FROM #itemsToDelete); 

   --delete Duplicated Items
   DELETE FROM dbo.Item WHERE Id IN (SELECT Id FROM #itemsToDelete);

   DROP TABLE #itemsToDelete

   FETCH NEXT FROM dupeCursor INTO @readableId;
END

CLOSE dupeCursor;
DEALLOCATE dupeCursor;

我意识到光标很可能是问题所在,但我不确定如何在不使用一个的情况下更新所有 child table。

好的,我没有数据来测试子表,但它应该可以工作:

WITH V
     AS (SELECT *,
                ROW_NUMBER() OVER(PARTITION BY ReadableId ORDER BY UpdatedOn DESC) AS Row
         FROM dbo.Item WITH (NOLOCK))
     SELECT *
     INTO #itemsToDelete
     FROM V;

--UPDATE CHILDREN TABLES
DELETE FROM dbo.ItemDetails
WHERE ItemId IN
(
    SELECT Id
    FROM #itemsToDelete
    WHERE Row > 1
);
UPDATE IP
  SET
      IP.ItemId = itk.ID
FROM dbo.ItemPurchases AS IP
     INNER JOIN #itemsToDelete AS itd ON IP.ItemId = itd.ID
                                         AND itd.Row > 1
     INNER JOIN #itemsToDelete AS itk ON itk.ReadableIdentifier = itd.ReadableIdentifier
                                         AND itk.Row = 1
                                         AND itd.Row > 1;
UPDATE po
  SET
      po.ItemId = itk.ID
FROM dbo.PurchaseOrders AS po
     INNER JOIN #itemsToDelete AS itd ON po.ItemId = itd.ID
                                         AND itd.Row > 1
     INNER JOIN #itemsToDelete AS itk ON itk.ReadableIdentifier = itd.ReadableIdentifier
                                         AND itk.Row = 1
                                         AND itd.Row > 1;
DELETE FROM dbo.ItemMetadata
WHERE ItemId IN
(
    SELECT Id
    FROM #itemsToDelete
    WHERE Row > 1
); 

--delete Duplicated Items
DELETE FROM dbo.Item
WHERE Id IN
(
    SELECT Id
    FROM #itemsToDelete
    WHERE Row > 1
);