使用 ggplot 或 R 中的任何其他方法根据计数绘制线宽(大小)
plot line width (size) based on counts using ggplot or any other method in R
我有一个长格式的数据集,每个ID 'walks' 3个步骤,每个步骤(变量名是step)可以降落在不同的位置(变量名是milestone),我想绘制所有路径。因为有一些路径走得更多,所以我想让路径的宽度(大小)与其数量成正比。我想象它类似于 ggplot 中的 geom_line(aes(size=..count..))
,但它不起作用。
下面是我的代码,在代码中您可以找到示例数据集的 url。我添加宽度的愚蠢解决方案是避开线,但它不成比例,并且会留下裂缝。
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
geom_line(position = position_dodge(width=0.05)) +
scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))
我当前代码的绘图看起来像这样,但您可以看到裂缝,而且不成比例。
我希望这看起来像 Sankey 图,宽度表示计数。
这有帮助吗?
library(ggplot2)
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
stat_summary(geom="line", fun.y = "sum", aes(size=milestone),alpha=0.2, color="grey50")+
scale_x_discrete(limits=factor(0:2)) +
scale_y_discrete(limits=factor(0:10)) +
theme(panel.background = element_blank(),
legend.position = "none")
如果您正在寻找用户特定 路径计数,那么这可能会有所帮助:
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ddnew <- ddnew %>%
group_by(user_id) %>%
mutate(step_id = paste(step, collapse = ","),
milestone_id = paste(milestone, collapse = ",")) %>%
group_by(step_id, milestone_id) %>%
mutate(width = n())
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
geom_line(aes(size = width)) +
scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))
想法是计算唯一的用户特定路径并将这些计数分配为 geom_line()
美学中的宽度。
一种选择是使用 riverplot
包。首先,您需要汇总数据,以便定义边和节点。
> library(riverplot)
>
> paths <- spread(ddnew, step, milestone) %>%
+ count(`1`, `2`, `3`)
> paths
Source: local data frame [9 x 4]
Groups: 1, 2 [?]
`1` `2` `3` n
<int> <int> <int> <int>
1 1 2 3 7
2 1 2 10 8
3 1 3 2 1
4 1 4 8 1
5 1 10 2 118
6 1 10 3 33
7 1 10 4 2
8 1 10 5 1
9 1 10 NA 46
接下来定义您的节点(即步骤和里程碑的每个组合)。
prefix <- function(p, n) {paste(p, n, sep = '-')}
nodes <- distinct(ddnew, step, milestone) %>%
mutate(ID = prefix(step, milestone),
y = dense_rank(milestone)) %>%
select(ID, x = step, y)
然后定义你的边:
e12 <- group_by(paths, N1 = `1`, N2 = `2`) %>%
summarise(Value = sum(n)) %>%
ungroup() %>%
mutate(N1 = prefix(1, N1),
N2 = prefix(2, N2))
e23 <- group_by(paths, N1 = `2`, N2 = `3`) %>%
filter(!is.na(N2)) %>%
summarise(Value = sum(n)) %>%
ungroup() %>%
mutate(N1 = prefix(2, N1),
N2 = prefix(3, N2))
edges <- bind_rows(e12, e23) %>%
mutate(Value = Value) %>%
as.data.frame()
最后,制作剧情:
style <- default.style()
style$srt <- '0' # display node labels horizontally
makeRiver(nodes, edges) %>% plot(default_style = style)
我有一个长格式的数据集,每个ID 'walks' 3个步骤,每个步骤(变量名是step)可以降落在不同的位置(变量名是milestone),我想绘制所有路径。因为有一些路径走得更多,所以我想让路径的宽度(大小)与其数量成正比。我想象它类似于 ggplot 中的 geom_line(aes(size=..count..))
,但它不起作用。
下面是我的代码,在代码中您可以找到示例数据集的 url。我添加宽度的愚蠢解决方案是避开线,但它不成比例,并且会留下裂缝。
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
geom_line(position = position_dodge(width=0.05)) +
scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))
我当前代码的绘图看起来像这样,但您可以看到裂缝,而且不成比例。
我希望这看起来像 Sankey 图,宽度表示计数。
这有帮助吗?
library(ggplot2)
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
stat_summary(geom="line", fun.y = "sum", aes(size=milestone),alpha=0.2, color="grey50")+
scale_x_discrete(limits=factor(0:2)) +
scale_y_discrete(limits=factor(0:10)) +
theme(panel.background = element_blank(),
legend.position = "none")
如果您正在寻找用户特定 路径计数,那么这可能会有所帮助:
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" )
ddnew <- ddnew %>%
group_by(user_id) %>%
mutate(step_id = paste(step, collapse = ","),
milestone_id = paste(milestone, collapse = ",")) %>%
group_by(step_id, milestone_id) %>%
mutate(width = n())
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
geom_line(aes(size = width)) +
scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))
想法是计算唯一的用户特定路径并将这些计数分配为 geom_line()
美学中的宽度。
一种选择是使用 riverplot
包。首先,您需要汇总数据,以便定义边和节点。
> library(riverplot)
>
> paths <- spread(ddnew, step, milestone) %>%
+ count(`1`, `2`, `3`)
> paths
Source: local data frame [9 x 4]
Groups: 1, 2 [?]
`1` `2` `3` n
<int> <int> <int> <int>
1 1 2 3 7
2 1 2 10 8
3 1 3 2 1
4 1 4 8 1
5 1 10 2 118
6 1 10 3 33
7 1 10 4 2
8 1 10 5 1
9 1 10 NA 46
接下来定义您的节点(即步骤和里程碑的每个组合)。
prefix <- function(p, n) {paste(p, n, sep = '-')}
nodes <- distinct(ddnew, step, milestone) %>%
mutate(ID = prefix(step, milestone),
y = dense_rank(milestone)) %>%
select(ID, x = step, y)
然后定义你的边:
e12 <- group_by(paths, N1 = `1`, N2 = `2`) %>%
summarise(Value = sum(n)) %>%
ungroup() %>%
mutate(N1 = prefix(1, N1),
N2 = prefix(2, N2))
e23 <- group_by(paths, N1 = `2`, N2 = `3`) %>%
filter(!is.na(N2)) %>%
summarise(Value = sum(n)) %>%
ungroup() %>%
mutate(N1 = prefix(2, N1),
N2 = prefix(3, N2))
edges <- bind_rows(e12, e23) %>%
mutate(Value = Value) %>%
as.data.frame()
最后,制作剧情:
style <- default.style()
style$srt <- '0' # display node labels horizontally
makeRiver(nodes, edges) %>% plot(default_style = style)