使用 ggplot 或 R 中的任何其他方法根据计数绘制线宽（大小）

Question

我有一个长格式的数据集，每个ID 'walks' 3个步骤，每个步骤（变量名是step）可以降落在不同的位置（变量名是milestone），我想绘制所有路径。因为有一些路径走得更多，所以我想让路径的宽度（大小）与其数量成正比。我想象它类似于 ggplot 中的 geom_line(aes(size=..count..))，但它不起作用。

下面是我的代码，在代码中您可以找到示例数据集的 url。我添加宽度的愚蠢解决方案是避开线，但它不成比例，并且会留下裂缝。

ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" ) 
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
  geom_line(position = position_dodge(width=0.05)) +
  scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
  scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))

我当前代码的绘图看起来像这样，但您可以看到裂缝，而且不成比例。

我希望这看起来像 Sankey 图，宽度表示计数。

Answer 1

这有帮助吗？

library(ggplot2)
ddnew <- read.csv("https://raw.github.com/bossaround/question/master/data9.csv" ) 
ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
        stat_summary(geom="line", fun.y = "sum", aes(size=milestone),alpha=0.2, color="grey50")+
        scale_x_discrete(limits=factor(0:2)) +
        scale_y_discrete(limits=factor(0:10)) +
        theme(panel.background = element_blank(), 
              legend.position = "none")

Answer 2

如果您正在寻找用户特定 路径计数，那么这可能会有所帮助：

ddnew <-   read.csv("https://raw.github.com/bossaround/question/master/data9.csv" ) 

ddnew <- ddnew %>% 
  group_by(user_id) %>% 
  mutate(step_id = paste(step, collapse = ","), 
         milestone_id = paste(milestone, collapse = ",")) %>% 
  group_by(step_id, milestone_id) %>% 
  mutate(width = n())

ggplot(ddnew, aes(x=step, y=milestone, group=user_id)) +
  geom_line(aes(size = width)) +
  scale_x_discrete(limits=c("0","1","2","3","4","5","6","7","8","9")) +
  scale_y_discrete(limits=c("0","1","2","3","4","5","6","7","8","9"))

想法是计算唯一的用户特定路径并将这些计数分配为 geom_line() 美学中的宽度。

Answer 3

一种选择是使用 riverplot 包。首先，您需要汇总数据，以便定义边和节点。

> library(riverplot)
> 
> paths <- spread(ddnew, step, milestone) %>%
+   count(`1`, `2`, `3`)
> paths
Source: local data frame [9 x 4]
Groups: 1, 2 [?]

    `1`   `2`   `3`     n
  <int> <int> <int> <int>
1     1     2     3     7
2     1     2    10     8
3     1     3     2     1
4     1     4     8     1
5     1    10     2   118
6     1    10     3    33
7     1    10     4     2
8     1    10     5     1
9     1    10    NA    46

接下来定义您的节点（即步骤和里程碑的每个组合）。

prefix <- function(p, n) {paste(p, n, sep = '-')}

nodes <- distinct(ddnew, step, milestone) %>%
  mutate(ID = prefix(step, milestone),
         y = dense_rank(milestone)) %>%
  select(ID, x = step, y)

然后定义你的边：

e12 <- group_by(paths, N1 = `1`, N2 = `2`) %>%
  summarise(Value = sum(n)) %>%
  ungroup() %>%
  mutate(N1 = prefix(1, N1),
         N2 = prefix(2, N2))

e23 <- group_by(paths, N1 = `2`, N2 = `3`) %>%
  filter(!is.na(N2)) %>%
  summarise(Value = sum(n)) %>%
  ungroup() %>%
  mutate(N1 = prefix(2, N1),
         N2 = prefix(3, N2))

edges <- bind_rows(e12, e23) %>% 
  mutate(Value = Value) %>%
  as.data.frame()

最后，制作剧情：

style <- default.style()
style$srt <- '0'  # display node labels horizontally

makeRiver(nodes, edges) %>% plot(default_style = style)

使用 ggplot 或 R 中的任何其他方法根据计数绘制线宽（大小）

plot line width (size) based on counts using ggplot or any other method in R

r

ggplot2

sankey-diagram