R中的桑基图标签
Sankey Diagram labels in R
背景
我正在用 R 创建一个桑基图,我正在努力标记节点。
例如,我将 一个包含 10 名接受 COVID-19 筛查的假想患者的数据集。在基线时,所有患者的 COVID-19 均为阴性。比方说 1 周后,所有患者再次接受检测:现在,3 名患者呈阳性,6 名患者呈阴性,1 名患者的结果不确定。又一周后,3名阳性患者仍为阳性,1名患者由阴性转为阳性,其余均为阴性。
data <- data.frame(patient = 1:10,
baseline = rep("neg", 10),
test1 = c(rep("pos",3), rep("neg", 6), "inconcl"),
test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))
尝试
为了创建 Sankey 图,我使用 ggsankey
package:
library(tidyverse)
#devtools::install_github("davidsjoberg/ggsankey")
df <- data %>%
make_long(baseline, test1, test2)
ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
fill = factor(node), label = node)) +
geom_sankey() +
geom_sankey_label(aes(fill = factor(node)), size = 3, color = "white") +
scale_fill_manual(values = c("grey", "green", "red")) +
theme(legend.position = "bottom", legend.title = element_blank())
问题
我想用每个节点中存在的患者数量来标记 nodes
(例如,第一个节点将标记为 10
,而 inconclusive
节点将被标记为 1
,依此类推...)。
如何在不对值进行硬编码的情况下在 R 中执行此操作?
部分解
要从数据中提取数字,我认为初始步骤应该是这样的:
data %>% count(baseline, test1, test2)
# baseline test1 test2 n
#1 neg inconcl neg 1
#2 neg neg neg 5
#3 neg neg pos 1
#4 neg pos <NA> 3
我想如果我能够在长数据的额外列中包含适当的值df
,我应该可以从美学上调用label=variable_name
?
试试这个:
library(ggplot2)
library(ggsankey)
library(dplyr)
# create a count data frame for each node
df_nr <-
df %>%
filter(!is.na(node)) %>%
group_by(x, node)%>%
summarise(count = n())
#> `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# join to sankey dataframe
df <-
df %>%
left_join(df_nr)
ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
fill = factor(node))) +
geom_sankey() +
geom_sankey_label(aes(label = node), size = 3, color = "white") +
geom_sankey_text(aes(label = count), size = 3.5, vjust = -1.5, check_overlap = TRUE) +
scale_fill_manual(values = c("grey", "green", "red")) +
theme_minimal()+
theme(legend.position = "bottom",
legend.title = element_blank())
数据
data <- data.frame(patient = 1:10,
baseline = rep("neg", 10),
test1 = c(rep("pos",3), rep("neg", 6), "inconcl"),
test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))
df <- data %>%
make_long(baseline, test1, test2)
如果您想要边界框,您可以调整计数标签的位置或将其更改为标签(不太确定这样是否有效)。不确定 geom_sankey_label
是否识别 check_overlap
以避免计数文本的多次重叠。
由 reprex package (v2.0.0)
于 2021-04-20 创建
背景
我正在用 R 创建一个桑基图,我正在努力标记节点。
例如,我将
data <- data.frame(patient = 1:10,
baseline = rep("neg", 10),
test1 = c(rep("pos",3), rep("neg", 6), "inconcl"),
test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))
尝试
为了创建 Sankey 图,我使用 ggsankey
package:
library(tidyverse)
#devtools::install_github("davidsjoberg/ggsankey")
df <- data %>%
make_long(baseline, test1, test2)
ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
fill = factor(node), label = node)) +
geom_sankey() +
geom_sankey_label(aes(fill = factor(node)), size = 3, color = "white") +
scale_fill_manual(values = c("grey", "green", "red")) +
theme(legend.position = "bottom", legend.title = element_blank())
问题
我想用每个节点中存在的患者数量来标记 nodes
(例如,第一个节点将标记为 10
,而 inconclusive
节点将被标记为 1
,依此类推...)。
如何在不对值进行硬编码的情况下在 R 中执行此操作?
部分解
要从数据中提取数字,我认为初始步骤应该是这样的:
data %>% count(baseline, test1, test2)
# baseline test1 test2 n
#1 neg inconcl neg 1
#2 neg neg neg 5
#3 neg neg pos 1
#4 neg pos <NA> 3
我想如果我能够在长数据的额外列中包含适当的值df
,我应该可以从美学上调用label=variable_name
?
试试这个:
library(ggplot2)
library(ggsankey)
library(dplyr)
# create a count data frame for each node
df_nr <-
df %>%
filter(!is.na(node)) %>%
group_by(x, node)%>%
summarise(count = n())
#> `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.
# join to sankey dataframe
df <-
df %>%
left_join(df_nr)
ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
fill = factor(node))) +
geom_sankey() +
geom_sankey_label(aes(label = node), size = 3, color = "white") +
geom_sankey_text(aes(label = count), size = 3.5, vjust = -1.5, check_overlap = TRUE) +
scale_fill_manual(values = c("grey", "green", "red")) +
theme_minimal()+
theme(legend.position = "bottom",
legend.title = element_blank())
数据
data <- data.frame(patient = 1:10,
baseline = rep("neg", 10),
test1 = c(rep("pos",3), rep("neg", 6), "inconcl"),
test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))
df <- data %>%
make_long(baseline, test1, test2)
如果您想要边界框,您可以调整计数标签的位置或将其更改为标签(不太确定这样是否有效)。不确定 geom_sankey_label
是否识别 check_overlap
以避免计数文本的多次重叠。
由 reprex package (v2.0.0)
于 2021-04-20 创建