R 中的传递关系:查找一个值的所有链接记录
Transitive relations in R: finding all linked records of a value
我有一个显示链接记录的数据框:
df <- data.frame(case = c(1,2,3,4,5,6), linked_to = c("2,4", 3,NA,NA,6,NA), stringsAsFactors = F)
# case linked_to
# 1 2,4
# 2 3
# 3 <NA>
# 4 <NA>
# 5 6
# 6 <NA>
在示例中,案例 1
链接到案例 2
和 4
。由于案例 2
也与案例 3
相关联,因此案例 1
与案例 2
、3
和 4
相关联。我想创建一个新列来指定 all linked cases:
# case linked_to all_linked
# 1 2,4 1,2,3,4
# 2 3 1,2,3,4
# 3 <NA> 1,2,3,4
# 4 <NA> 1,2,3,4
# 5 6 5,6
# 6 <NA> 5,6
我可以使用 igraph 中的 decompose.graph
函数来完成此操作,以找到孤立的组件,但解决方案似乎有些复杂:
library(igraph)
# Transform to igraph format
to <- sapply(df$linked_to, function(x) unlist(strsplit(x,",")) )
from <- rep(rownames(df), sapply(to, length) )
to <- unlist(to)
from <- from[!is.na(to)]
to <- to[!is.na(to)]
d <- data.frame(from,to)
gr <- graph.data.frame(d)
# Split into components
grs <- decompose.graph(gr)
comp <- sapply(grs, function(x) V(x)$name)
matches <- sapply(df$case, function(case) {
sapply(comp, function(comp) {
case %in% comp
})
})
matches <- as.data.frame(matches)
ind <- sapply(matches, which)
# Assign all members of the component they belong to to each vertex
df$all_linked <- sapply(ind, function(x) {
paste(comp[[x]], collapse = ",")
})
有没有更简单高效的解决方案? 可以,但不需要依赖网络分析工具。
这个效率更高一些,使用了sna
包中的kpath.census
函数。 (或者,您可以在 igraph
中使用 distances
来达到同样的效果。)
library(sna)
df <- data.frame(case = c(1,2,3,4,5,6),
linked_to = c("2,4", 3,NA,NA,6,NA),
stringsAsFactors = F)
net <- data.frame(case = c(1,1,2,3,4,5,6),
linked_to = c(2, 4, 3,NA,NA,6,NA),
stringsAsFactors = F)
g <- network(net[complete.cases(net),], directed = FALSE)
comemb <- kpath.census(g, maxlen = 10, mode = "digraph", tabulate.by.vertex = TRUE,
path.comembership = "sum")$path.comemb
comemb_names <- sapply(1:ncol(comemb),
function(x) ifelse(comemb[x,] > 0 ,
colnames(comemb)[x], 0))
comemb_names <- lapply(1:nrow(comemb_names), function(x) comemb_names[x,][comemb_names[x,] != "0"])
df$all_linked <- sapply(comemb_names, function(x) paste(x,collapse = ","))
结果:
> df
case linked_to all_linked
1 1 2,4 1,2,3,4
2 2 3 1,2,3,4
3 3 <NA> 1,2,3,4
4 4 <NA> 1,2,3,4
5 5 6 5,6
6 6 <NA> 5,6
我有一个显示链接记录的数据框:
df <- data.frame(case = c(1,2,3,4,5,6), linked_to = c("2,4", 3,NA,NA,6,NA), stringsAsFactors = F)
# case linked_to
# 1 2,4
# 2 3
# 3 <NA>
# 4 <NA>
# 5 6
# 6 <NA>
在示例中,案例 1
链接到案例 2
和 4
。由于案例 2
也与案例 3
相关联,因此案例 1
与案例 2
、3
和 4
相关联。我想创建一个新列来指定 all linked cases:
# case linked_to all_linked
# 1 2,4 1,2,3,4
# 2 3 1,2,3,4
# 3 <NA> 1,2,3,4
# 4 <NA> 1,2,3,4
# 5 6 5,6
# 6 <NA> 5,6
我可以使用 igraph 中的 decompose.graph
函数来完成此操作,以找到孤立的组件,但解决方案似乎有些复杂:
library(igraph)
# Transform to igraph format
to <- sapply(df$linked_to, function(x) unlist(strsplit(x,",")) )
from <- rep(rownames(df), sapply(to, length) )
to <- unlist(to)
from <- from[!is.na(to)]
to <- to[!is.na(to)]
d <- data.frame(from,to)
gr <- graph.data.frame(d)
# Split into components
grs <- decompose.graph(gr)
comp <- sapply(grs, function(x) V(x)$name)
matches <- sapply(df$case, function(case) {
sapply(comp, function(comp) {
case %in% comp
})
})
matches <- as.data.frame(matches)
ind <- sapply(matches, which)
# Assign all members of the component they belong to to each vertex
df$all_linked <- sapply(ind, function(x) {
paste(comp[[x]], collapse = ",")
})
有没有更简单高效的解决方案? 可以,但不需要依赖网络分析工具。
这个效率更高一些,使用了sna
包中的kpath.census
函数。 (或者,您可以在 igraph
中使用 distances
来达到同样的效果。)
library(sna)
df <- data.frame(case = c(1,2,3,4,5,6),
linked_to = c("2,4", 3,NA,NA,6,NA),
stringsAsFactors = F)
net <- data.frame(case = c(1,1,2,3,4,5,6),
linked_to = c(2, 4, 3,NA,NA,6,NA),
stringsAsFactors = F)
g <- network(net[complete.cases(net),], directed = FALSE)
comemb <- kpath.census(g, maxlen = 10, mode = "digraph", tabulate.by.vertex = TRUE,
path.comembership = "sum")$path.comemb
comemb_names <- sapply(1:ncol(comemb),
function(x) ifelse(comemb[x,] > 0 ,
colnames(comemb)[x], 0))
comemb_names <- lapply(1:nrow(comemb_names), function(x) comemb_names[x,][comemb_names[x,] != "0"])
df$all_linked <- sapply(comemb_names, function(x) paste(x,collapse = ","))
结果:
> df
case linked_to all_linked
1 1 2,4 1,2,3,4
2 2 3 1,2,3,4
3 3 <NA> 1,2,3,4
4 4 <NA> 1,2,3,4
5 5 6 5,6
6 6 <NA> 5,6