如何在 R 中有效地将分层网络的 table 转换为边列表

How to transform a table of an hierarchical network to an edge list efficiently in R

我知道有很多关于生成边缘列表的未解决和已回答的问题,但我发现 none 适合我的情况:

我有一个 table 表示严格分层的网络,我想将其转换为具有 3 列的网络:源节点、目标节点和交互类型。 table 相当冗余,第一列列出了所有一级节点,第二列是所有对应的二级节点,依此类推:

  first second third
1     A      F     L
2     B      F     L
3     C      G     L
4     D      F     L
5     E      G     L
6     L      L     L

有关网络的可视化,请参见下图:

我想要的table是这样的:

  source target  level
1      F      L  third
2      G      L  third
3      L      L  third
4      A      F second
5      B      F second
6      C      G second
7      D      F second
8      E      G second
9      L      L second

到目前为止,我只需要用很少的关卡做一次,所以我对 dplyr 使用了以下笨拙的方法:

library(dplyr)
example.df <- data.frame(
  "first"  = c("A", "B", "C", "D", "E", "L"),
  "second" = c("F", "F", "G", "F", "G", "L"),
  "third"  = c("L", "L", "L", "L", "L", "L")
)
name.v <- c("source","target")
third.df <- example.df %>% 
  group_by(second) %>% 
  summarise(third = unique(third))
names(third.df) <- name.v
second.df <- example.df %>% 
  group_by(first) %>% 
  summarise(second = unique(second))
names(second.df) <- name.v
hier.df <- bind_rows("third" = third.df, "second"= second.df, .id = "level") %>% 
  select(source, target, level)
# using igraph to generate the image
library(igraph)
hier.graph <- graph_from_data_frame(hier.df)
plot(hier.graph)

显然,这会非常糟糕。所以对于编程,我更愿意找到一种更精简的方法,例如在基础 R.

也许这也很笨拙,但是第 1 列和第 2 列建立了二级连接;第 2 列和第 3 列是第三级。只需将它们分开并使用 rbind.

SL = unique(cbind(TAB[,1:2], "second"))
names(SL) = c("source", "target", "level")

TL = unique(cbind(TAB[,2:3], "third"))
names(TL) = c("source", "target", "level")

rbind(TL, SL)
   source target  level
1       F      L  third
3       G      L  third
6       L      L  third
11      A      F second
2       B      F second
31      C      G second
4       D      F second
5       E      G second
61      L      L second

数据

TAB = read.table(text="first second third
1     A      F     L
2     B      F     L
3     C      G     L
4     D      F     L
5     E      G     L
6     L      L     L",
header=TRUE)

您可以在此处尝试使用 map_df

您的数据

df <- read.table(text="  first second third
A      F     L
B      F     L
C      G     L
D      F     L
E      G     L
L      L     L", header=TRUE, stringsAsFactors=FALSE)

解决方案

library(purrr)
map_df(2:ncol(df), ~select(df, (.x-1):.x) %>% setNames(c("source", "target")), .id="id") %>%
    group_by(id) %>%
    distinct() %>%
    ungroup() %>%
    mutate(id = colnames(df)[as.numeric(id)+1])

# A tibble: 9 x 3
      # id source target
   # <chr>  <chr>  <chr>
# 1 second      A      F
# 2 second      B      F
# 3 second      C      G
# 4 second      D      F
# 5 second      E      G
# 6 second      L      L
# 7  third      F      L
# 8  third      G      L
# 9  third      L      L

这可以缩放任意数量的列

set.seed(1)
new_df <- as_tibble(matrix(sample(LETTERS, 25, replace=FALSE), ncol=5)) %>%
        setNames(c("first", "second", "third", "fourth", "fifth"))

myfun <- function(data) {
    map_df(2:ncol(data), ~select(data, (.x-1):.x) %>% setNames(c("source", "target")), .id="id") %>%
        group_by(id) %>%
        distinct() %>%
        ungroup() %>%
        mutate(id = colnames(data)[as.numeric(id)+1])
}
myfun(new_df)

# A tibble: 20 x 3
       # id source target
    # <chr>  <chr>  <chr>
 # 1 second      G      S
 # 2 second      J      W
 # 3 second      N      M
 # 4 second      U      L
 # 5 second      E      B
 # 6  third      S      D
 # 7  third      W      C
 # 8  third      M      Y
 # 9  third      L      V
# 10  third      B      X
# 11 fourth      D      F
# 12 fourth      C      H
# 13 fourth      Y      I
# 14 fourth      V      P
# 15 fourth      X      K
# 16  fifth      F      Z
# 17  fifth      H      Q
# 18  fifth      I      O
# 19  fifth      P      A
# 20  fifth      K      R

igraphas_data_frame() 会为您解决这个问题。 what 可以是 "edges""vertices""both",这将 return 在 list of data.frame 中的顶点和边.

?igraph::as_data_frame

igraph::as_data_frame(x = hier.graph, what = "edges") %>%
`colnames<-`(c("source", "target", "level"))

# source target  level
# 1      F      L  third
# 2      G      L  third
# 3      L      L  third
# 4      A      F second
# 5      B      F second
# 6      C      G second
# 7      D      F second
# 8      E      G second
# 9      L      L second