如何在 R 中有效地将分层网络的 table 转换为边列表
How to transform a table of an hierarchical network to an edge list efficiently in R
我知道有很多关于生成边缘列表的未解决和已回答的问题,但我发现 none 适合我的情况:
我有一个 table 表示严格分层的网络,我想将其转换为具有 3 列的网络:源节点、目标节点和交互类型。 table 相当冗余,第一列列出了所有一级节点,第二列是所有对应的二级节点,依此类推:
first second third
1 A F L
2 B F L
3 C G L
4 D F L
5 E G L
6 L L L
有关网络的可视化,请参见下图:
我想要的table是这样的:
source target level
1 F L third
2 G L third
3 L L third
4 A F second
5 B F second
6 C G second
7 D F second
8 E G second
9 L L second
到目前为止,我只需要用很少的关卡做一次,所以我对 dplyr 使用了以下笨拙的方法:
library(dplyr)
example.df <- data.frame(
"first" = c("A", "B", "C", "D", "E", "L"),
"second" = c("F", "F", "G", "F", "G", "L"),
"third" = c("L", "L", "L", "L", "L", "L")
)
name.v <- c("source","target")
third.df <- example.df %>%
group_by(second) %>%
summarise(third = unique(third))
names(third.df) <- name.v
second.df <- example.df %>%
group_by(first) %>%
summarise(second = unique(second))
names(second.df) <- name.v
hier.df <- bind_rows("third" = third.df, "second"= second.df, .id = "level") %>%
select(source, target, level)
# using igraph to generate the image
library(igraph)
hier.graph <- graph_from_data_frame(hier.df)
plot(hier.graph)
显然,这会非常糟糕。所以对于编程,我更愿意找到一种更精简的方法,例如在基础 R.
也许这也很笨拙,但是第 1 列和第 2 列建立了二级连接;第 2 列和第 3 列是第三级。只需将它们分开并使用 rbind
.
SL = unique(cbind(TAB[,1:2], "second"))
names(SL) = c("source", "target", "level")
TL = unique(cbind(TAB[,2:3], "third"))
names(TL) = c("source", "target", "level")
rbind(TL, SL)
source target level
1 F L third
3 G L third
6 L L third
11 A F second
2 B F second
31 C G second
4 D F second
5 E G second
61 L L second
数据
TAB = read.table(text="first second third
1 A F L
2 B F L
3 C G L
4 D F L
5 E G L
6 L L L",
header=TRUE)
您可以在此处尝试使用 map_df
您的数据
df <- read.table(text=" first second third
A F L
B F L
C G L
D F L
E G L
L L L", header=TRUE, stringsAsFactors=FALSE)
解决方案
library(purrr)
map_df(2:ncol(df), ~select(df, (.x-1):.x) %>% setNames(c("source", "target")), .id="id") %>%
group_by(id) %>%
distinct() %>%
ungroup() %>%
mutate(id = colnames(df)[as.numeric(id)+1])
# A tibble: 9 x 3
# id source target
# <chr> <chr> <chr>
# 1 second A F
# 2 second B F
# 3 second C G
# 4 second D F
# 5 second E G
# 6 second L L
# 7 third F L
# 8 third G L
# 9 third L L
这可以缩放任意数量的列
set.seed(1)
new_df <- as_tibble(matrix(sample(LETTERS, 25, replace=FALSE), ncol=5)) %>%
setNames(c("first", "second", "third", "fourth", "fifth"))
myfun <- function(data) {
map_df(2:ncol(data), ~select(data, (.x-1):.x) %>% setNames(c("source", "target")), .id="id") %>%
group_by(id) %>%
distinct() %>%
ungroup() %>%
mutate(id = colnames(data)[as.numeric(id)+1])
}
myfun(new_df)
# A tibble: 20 x 3
# id source target
# <chr> <chr> <chr>
# 1 second G S
# 2 second J W
# 3 second N M
# 4 second U L
# 5 second E B
# 6 third S D
# 7 third W C
# 8 third M Y
# 9 third L V
# 10 third B X
# 11 fourth D F
# 12 fourth C H
# 13 fourth Y I
# 14 fourth V P
# 15 fourth X K
# 16 fifth F Z
# 17 fifth H Q
# 18 fifth I O
# 19 fifth P A
# 20 fifth K R
igraph
的 as_data_frame()
会为您解决这个问题。 what
可以是 "edges"
、"vertices"
或 "both"
,这将 return 在 list
of data.frame
中的顶点和边.
?igraph::as_data_frame
igraph::as_data_frame(x = hier.graph, what = "edges") %>%
`colnames<-`(c("source", "target", "level"))
# source target level
# 1 F L third
# 2 G L third
# 3 L L third
# 4 A F second
# 5 B F second
# 6 C G second
# 7 D F second
# 8 E G second
# 9 L L second
我知道有很多关于生成边缘列表的未解决和已回答的问题,但我发现 none 适合我的情况:
我有一个 table 表示严格分层的网络,我想将其转换为具有 3 列的网络:源节点、目标节点和交互类型。 table 相当冗余,第一列列出了所有一级节点,第二列是所有对应的二级节点,依此类推:
first second third
1 A F L
2 B F L
3 C G L
4 D F L
5 E G L
6 L L L
有关网络的可视化,请参见下图:
我想要的table是这样的:
source target level
1 F L third
2 G L third
3 L L third
4 A F second
5 B F second
6 C G second
7 D F second
8 E G second
9 L L second
到目前为止,我只需要用很少的关卡做一次,所以我对 dplyr 使用了以下笨拙的方法:
library(dplyr)
example.df <- data.frame(
"first" = c("A", "B", "C", "D", "E", "L"),
"second" = c("F", "F", "G", "F", "G", "L"),
"third" = c("L", "L", "L", "L", "L", "L")
)
name.v <- c("source","target")
third.df <- example.df %>%
group_by(second) %>%
summarise(third = unique(third))
names(third.df) <- name.v
second.df <- example.df %>%
group_by(first) %>%
summarise(second = unique(second))
names(second.df) <- name.v
hier.df <- bind_rows("third" = third.df, "second"= second.df, .id = "level") %>%
select(source, target, level)
# using igraph to generate the image
library(igraph)
hier.graph <- graph_from_data_frame(hier.df)
plot(hier.graph)
显然,这会非常糟糕。所以对于编程,我更愿意找到一种更精简的方法,例如在基础 R.
也许这也很笨拙,但是第 1 列和第 2 列建立了二级连接;第 2 列和第 3 列是第三级。只需将它们分开并使用 rbind
.
SL = unique(cbind(TAB[,1:2], "second"))
names(SL) = c("source", "target", "level")
TL = unique(cbind(TAB[,2:3], "third"))
names(TL) = c("source", "target", "level")
rbind(TL, SL)
source target level
1 F L third
3 G L third
6 L L third
11 A F second
2 B F second
31 C G second
4 D F second
5 E G second
61 L L second
数据
TAB = read.table(text="first second third
1 A F L
2 B F L
3 C G L
4 D F L
5 E G L
6 L L L",
header=TRUE)
您可以在此处尝试使用 map_df
您的数据
df <- read.table(text=" first second third
A F L
B F L
C G L
D F L
E G L
L L L", header=TRUE, stringsAsFactors=FALSE)
解决方案
library(purrr)
map_df(2:ncol(df), ~select(df, (.x-1):.x) %>% setNames(c("source", "target")), .id="id") %>%
group_by(id) %>%
distinct() %>%
ungroup() %>%
mutate(id = colnames(df)[as.numeric(id)+1])
# A tibble: 9 x 3
# id source target
# <chr> <chr> <chr>
# 1 second A F
# 2 second B F
# 3 second C G
# 4 second D F
# 5 second E G
# 6 second L L
# 7 third F L
# 8 third G L
# 9 third L L
这可以缩放任意数量的列
set.seed(1)
new_df <- as_tibble(matrix(sample(LETTERS, 25, replace=FALSE), ncol=5)) %>%
setNames(c("first", "second", "third", "fourth", "fifth"))
myfun <- function(data) {
map_df(2:ncol(data), ~select(data, (.x-1):.x) %>% setNames(c("source", "target")), .id="id") %>%
group_by(id) %>%
distinct() %>%
ungroup() %>%
mutate(id = colnames(data)[as.numeric(id)+1])
}
myfun(new_df)
# A tibble: 20 x 3
# id source target
# <chr> <chr> <chr>
# 1 second G S
# 2 second J W
# 3 second N M
# 4 second U L
# 5 second E B
# 6 third S D
# 7 third W C
# 8 third M Y
# 9 third L V
# 10 third B X
# 11 fourth D F
# 12 fourth C H
# 13 fourth Y I
# 14 fourth V P
# 15 fourth X K
# 16 fifth F Z
# 17 fifth H Q
# 18 fifth I O
# 19 fifth P A
# 20 fifth K R
igraph
的 as_data_frame()
会为您解决这个问题。 what
可以是 "edges"
、"vertices"
或 "both"
,这将 return 在 list
of data.frame
中的顶点和边.
?igraph::as_data_frame
igraph::as_data_frame(x = hier.graph, what = "edges") %>%
`colnames<-`(c("source", "target", "level"))
# source target level
# 1 F L third
# 2 G L third
# 3 L L third
# 4 A F second
# 5 B F second
# 6 C G second
# 7 D F second
# 8 E G second
# 9 L L second