R 中带有网络包的桑基图
Sankey Diagram with Network package in R
我正在尝试按照 R Graph Gallery 的说明创建一个简单的 Sankey 图:https://www.r-graph-gallery.com/322-custom-colours-in-sankey-diagram.html。我有一个数据集,每个 ID 有两个 obvs。对于每个时期,我都知道某人是否贫穷。数据集如下所示:
ID YEAR POVERTY
1 2018 0
1 2019 1
2 2018 1
2 2019 1
3 2018 0
3 2019 1
4 2018 0
4 2019 0
5 2018 0
5 2018 0
我想我需要将其转换为源-目标-值 table 但我不明白“值”的用途。有人会向我解释吗?我该如何推进它?
非常感谢您的帮助:)
我已经使用了提供的代码:
library("dplyr", warn.conflicts = FALSE)
library("networkD3")
diagram <- SUBSET05%>%
dplyr::mutate(Poverty = dplyr::if_else(Poverty==1, "poor", "not poor")) %>%
dplyr::transmute(id_nmbr, yr_interview, Poverty = paste(Poverty, yr_interview, sep = "_"))
links <- diagram %>%
tidyr::pivot_wider(names_from = yr_interview, values_from = Poverty) %>%
dplyr::rename(source = `2018`, target = `2019`)
nodes <- data.frame(name = unique(c(links$source, links$target))) %>%
tidyr::separate(name, into = c("group", "year"), sep = "_", remove = FALSE)
links$id_nmbrsource <- match(links$source, nodes$name)-1
links$id_nmbrtarget <- match(links$target, nodes$name)-1
links$value <- 10
sn <- sankeyNetwork(Links = links,
Nodes = nodes,
NodeID = "name",
Source = "id_nmbrsource",
Target = "id_nmbrtarget",
NodeGroup = "group",
Value = "value")
sn
我得到以下图像:
我的数据集有 34034 个观测值,每年 17017 个。因此我必须更改值列吗?是什么导致了丑陋的形象?
我不确定我是否真的理解您希望输出的样子。
不管怎样,我认为“价值”对你来说并不重要。每个连接具有相同的重要性,因此您可以将其设置为任意值。
如果重点只是显示有多少人从贫困走向非贫困,那么出发点应该是你实际上有四个群体:两次“贫困”和“非贫困”期间。
结果会是这样的:
library("dplyr", warn.conflicts = FALSE)
library("networkD3")
df <- tibble::tribble(
~ID, ~YEAR, ~POVERTY,
"1", 2018, 0,
"1", 2019, 1,
"2", 2018, 1,
"2", 2019, 1,
"3", 2018, 0,
"3", 2019, 1,
"4", 2018, 0,
"4", 2019, 0,
"5", 2018, 0,
"5", 2019, 0
) %>%
dplyr::mutate(POVERTY = dplyr::if_else(POVERTY==0, "poor", "not poor")) %>%
dplyr::transmute(ID, YEAR, POVERTY = paste(POVERTY, YEAR, sep = "_"))
links <- df %>%
tidyr::pivot_wider(names_from = YEAR, values_from = POVERTY) %>%
dplyr::rename(source = `2018`, target = `2019`)
nodes <- data.frame(name = unique(c(links$source, links$target))) %>%
tidyr::separate(name, into = c("group", "year"), sep = "_", remove = FALSE)
links$IDsource <- match(links$source, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1
links$value <- 10
sn <- sankeyNetwork(Links = links,
Nodes = nodes,
NodeID = "name",
Source = "IDsource",
Target = "IDtarget",
NodeGroup = "group",
Value = "value")
sn
我正在尝试按照 R Graph Gallery 的说明创建一个简单的 Sankey 图:https://www.r-graph-gallery.com/322-custom-colours-in-sankey-diagram.html。我有一个数据集,每个 ID 有两个 obvs。对于每个时期,我都知道某人是否贫穷。数据集如下所示:
ID YEAR POVERTY
1 2018 0
1 2019 1
2 2018 1
2 2019 1
3 2018 0
3 2019 1
4 2018 0
4 2019 0
5 2018 0
5 2018 0
我想我需要将其转换为源-目标-值 table 但我不明白“值”的用途。有人会向我解释吗?我该如何推进它?
非常感谢您的帮助:)
我已经使用了提供的代码:
library("dplyr", warn.conflicts = FALSE)
library("networkD3")
diagram <- SUBSET05%>%
dplyr::mutate(Poverty = dplyr::if_else(Poverty==1, "poor", "not poor")) %>%
dplyr::transmute(id_nmbr, yr_interview, Poverty = paste(Poverty, yr_interview, sep = "_"))
links <- diagram %>%
tidyr::pivot_wider(names_from = yr_interview, values_from = Poverty) %>%
dplyr::rename(source = `2018`, target = `2019`)
nodes <- data.frame(name = unique(c(links$source, links$target))) %>%
tidyr::separate(name, into = c("group", "year"), sep = "_", remove = FALSE)
links$id_nmbrsource <- match(links$source, nodes$name)-1
links$id_nmbrtarget <- match(links$target, nodes$name)-1
links$value <- 10
sn <- sankeyNetwork(Links = links,
Nodes = nodes,
NodeID = "name",
Source = "id_nmbrsource",
Target = "id_nmbrtarget",
NodeGroup = "group",
Value = "value")
sn
我得到以下图像:
我的数据集有 34034 个观测值,每年 17017 个。因此我必须更改值列吗?是什么导致了丑陋的形象?
我不确定我是否真的理解您希望输出的样子。
不管怎样,我认为“价值”对你来说并不重要。每个连接具有相同的重要性,因此您可以将其设置为任意值。
如果重点只是显示有多少人从贫困走向非贫困,那么出发点应该是你实际上有四个群体:两次“贫困”和“非贫困”期间。
结果会是这样的:
library("dplyr", warn.conflicts = FALSE)
library("networkD3")
df <- tibble::tribble(
~ID, ~YEAR, ~POVERTY,
"1", 2018, 0,
"1", 2019, 1,
"2", 2018, 1,
"2", 2019, 1,
"3", 2018, 0,
"3", 2019, 1,
"4", 2018, 0,
"4", 2019, 0,
"5", 2018, 0,
"5", 2019, 0
) %>%
dplyr::mutate(POVERTY = dplyr::if_else(POVERTY==0, "poor", "not poor")) %>%
dplyr::transmute(ID, YEAR, POVERTY = paste(POVERTY, YEAR, sep = "_"))
links <- df %>%
tidyr::pivot_wider(names_from = YEAR, values_from = POVERTY) %>%
dplyr::rename(source = `2018`, target = `2019`)
nodes <- data.frame(name = unique(c(links$source, links$target))) %>%
tidyr::separate(name, into = c("group", "year"), sep = "_", remove = FALSE)
links$IDsource <- match(links$source, nodes$name)-1
links$IDtarget <- match(links$target, nodes$name)-1
links$value <- 10
sn <- sankeyNetwork(Links = links,
Nodes = nodes,
NodeID = "name",
Source = "IDsource",
Target = "IDtarget",
NodeGroup = "group",
Value = "value")
sn