SQL 服务器 - 如何使部分重复的行继承原始行的值

Question

为了 link 跨数据集的记录，我首先根据关键 linking 变量（对名称、出生日期、性别等进行分区并删除 [= =25=] > 1).在 linking 完成后，我留下了一个新变量 "unique_id" 但这只会归因于原始记录（因为我删除了部分重复项）。我现在想将此 "unique_id" 重新附加回所有部分重复项。我该怎么做呢？有没有更好的方法我可以从一开始就使用？

数据目前是这样的：

row_number unique_id id      first_name last_name activity_date
1          10        2       Davy       Jones     1726-11-25
2          --        12      Davy       Jones     1751-02-11
3          --        43      Davy       Jones     1811-06-15
1          100       12114   John       Smith     2018-06-01
2          --        123123  John       Smith     2022-07-05
1          90        2591    Mary       Sue       2013-05-18

我希望 "unique_id" 像这样继承原件：

row_number unique_id id      first_name last_name activity_date
1          10        2       Davy       Jones     1726-11-25
2          10        12      Davy       Jones     1751-02-11
3          10        43      Davy       Jones     1811-06-15
1          100       12114   John       Smith     2018-06-01
2          100       123123  John       Smith     2022-07-05
1          90        2591    Mary       Sue       2013-05-18

生成此 table 的代码如下：

create table #test (
    unique_id int,
    id int,
    first_name varchar(255),
    last_name varchar(255),
    activity_date date
)

insert into #test 
values (100, 12114, 'John', 'Smith', '2018-06-01')

insert into #test (id, first_name, last_name, activity_date)
values (123123, 'John', 'Smith', '2022-07-05')

insert into #test
values (90, 2591, 'Mary', 'Sue', '2013-05-18')

insert into #test
values (10, 2, 'Davy', 'Jones', '1726-11-25')

insert into #test (id, first_name, last_name, activity_date)
values (12, 'Davy', 'Jones', '1751-02-11')

insert into #test (id, first_name, last_name, activity_date)
values (43, 'Davy', 'Jones', '1811-06-15')

select 
row_number() over (partition by first_name, last_name order by first_name, last_name) as row_number
,unique_id, id, first_name, last_name, activity_date
from #test

Answer 1

看起来您应该使用您认为合适的任何字段将具有链接 ID 的记录的子集与没有链接 ID 的记录连接起来，然后从中的 ID 更新未链接集中的 ID链接集。

Answer 2

一个简单的方法——假设每个 first_name/last_name 对一个值——是使用 window 函数：

select t.*, max(unique_id) over (partition by first_name, last_name) as new_unique_id
from #test t;

这可以放入 update:

with toupdate as (
      select t.*, max(unique_id) over (partition by first_name, last_name) as new_unique_id
      from #test t
     )
update toupdate
    set unique_id = new_unique_id;

这里是 rextester 语法说明。

Answer 3

试试这个：

with Dups as(
    select 
    row_number() over (partition by first_name, last_name order by first_name, last_name) as dup_number,
    -- dense_rank() over (order by first_name, last_name) as DuplicateGroupNumber, -- this allows you to see groups
    max(unique_id) over (partition by first_name, last_name) as GroupUniqueID,
    unique_id, id, first_name, last_name, activity_date
    from #test
)
update a
set unique_id = GroupUniqueID
from #test as a
    inner join Dups as b on a.id = b.id

select * from #test

结果

unique_id   id          first_name  
----------- ----------- ------------
100         12114       John        
100         123123      John        
90          2591        Mary        
10          2           Davy        
10          12          Davy        
10          43          Davy

SQL 服务器 - 如何使部分重复的行继承原始行的值

SQL Server - How to make partially duplicate rows inherit values from original row

sql

sql-server

append

updates

duplicates