当键之间没有一致的 1:1 映射时，如何使用多个键对行进行分组？

Question

uniqid	client_id	hh_id	group_id
u1	c1	h1	1
u1	c2	h1	1
u1	c3	h2	1
u2	c4	h1	1
u2	c5	h2	1
u3	c6	h3	2
u3	c7	h3	2
u3	c8	h4	2

假设一个家庭可以有 1 个以上的人作为其中的一部分，每个人在系统中都有一个主人唯一 ID。但是由于 process/workflow，每个人可以在系统中生成超过 1 个客户端 ID。此外，在极少数情况下，同一客户被映射到 1 个以上的家庭。

预期结果是将所有相关个人归为一个组，比如 g1，这样属于一个家庭（或与其他家庭重叠）的所有个人都在一个地方。

数据集：

df <- data.frame(list(uniqid = c("u1", "u1", "u1", "u2", "u2", "u3", "u3", "u3"), 
                  client_id = c("c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8"), 
                  hh_id = c("h1", "h1", "h2", "h1", "h2", "h3", "h3", "h4"), 
                  group_id = c(1,1,1,1,1,2,2,2)))

Group_id 是预期输出，每组相关个人（或家庭）一个唯一 ID。

我试过用这种方法对个人进行分组，解决了部分问题，但这会遗漏个人映射到的其他家庭 ID。

df %>% group_by(hh_id) %>% 
  arrange(hh_id, uniqid) %>% 
  mutate(hh_group = str_c(uniqid, collapse = ""))

Answer 1

这是一个图形关系。使用以下内容：

library(igraph)
df$groups <-components(graph_from_data_frame(df[c('uniqid', 'hh_id')]))$membership[df$uniqid]
df
  uniqid client_id hh_id group_id groups
1     u1        c1    h1        1      1
2     u1        c2    h1        1      1
3     u1        c3    h2        1      1
4     u2        c4    h1        1      1
5     u2        c5    h2        1      1
6     u3        c6    h3        2      2
7     u3        c7    h3        2      2
8     u3        c8    h4        2      2

当键之间没有一致的 1:1 映射时，如何使用多个键对行进行分组？

How to group rows using more than 1 key when there is no consistent 1:1 mapping between keys?

group-by

r