如何使用带有 2 个变量和 3 个子变量的概率的 ggplot 创建条形图

Question

迫切需要帮助。

原始数据来自https://www.hockey-reference.com/play-index/tiny.fcgi?id=mmDlH

看起来像这样： csv file

# A tibble: 6 x 19
  match_no Date  Tm    Opp   Outcome Time      G    PP    SH     S   PIM    GA  PPGA  SHGA
     <dbl> <chr> <chr> <chr> <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1        1 6/4/… NYI   WSH   W       REG       3     0     0    24     4     0     0     0
2        2 6/4/… WSH   NYI   L       REG       0     0     0    29     2     3     0     0
3        3 6/4/… STL   VAN   W       SO        3     1     0    36     6     2     2     0
4        4 6/4/… VAN   STL   L       SO        2     2     0    25     6     3     1     0
5        5 6/4/… COL   SJS   L       REG       2     0     0    30     4     5     0     0
6        6 6/4/… SJS   COL   W       REG       5     0     0    30     4     2     0     0
# … with 5 more variables: PPO <dbl>, PPOA <dbl>, SA <dbl>, OppPIM <dbl>, DIFF <dbl>

我可以转换为 this

A tibble: 6 x 5
# Groups:   Tm [1]
  Tm    Outcome Time      n  prob
  <chr> <chr>   <chr> <int> <dbl>
1 ANA   L       OT        7  0.09
2 ANA   L       REG      37  0.45
3 ANA   L       SO        3  0.04
4 ANA   W       OT        5  0.06
5 ANA   W       REG      27  0.33
6 ANA   W       SO        3  0.04

我用过这个

team_outcomes_regulation <-
df %>%
+ count(Tm,Outcome, Time) %>%
+ group_by(Tm) %>%
+ mutate(prob = round(prop.table(n), 2))

然后我尝试使用

进行 ggplot

team_outcomes_regulation %>%
ggplot(aes(x = Tm, y = prob, fill = Time)) 
+ geom_bar(position = "fill",stat = "identity")
+ theme(axis.text.x = element_text(angle = 90))

And this is what I get,but I am desperate to get the graph split with the 6 total (Wins by SO, Reg & OT, Losses by SO, Reg & OT)]3

我现在想尝试使用原始 df 比较赢球与净胜球。

 # A tibble: 6 x 19
      match_no Date  Tm    Opp   Outcome Time      G    PP    SH     S   PIM    GA  PPGA  SHGA
         <dbl> <chr> <chr> <chr> <chr>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    1        1 6/4/… NYI   WSH   W       REG       3     0     0    24     4     0     0     0
    2        2 6/4/… WSH   NYI   L       REG       0     0     0    29     2     3     0     0
    3        3 6/4/… STL   VAN   W       SO        3     1     0    36     6     2     2     0
    4        4 6/4/… VAN   STL   L       SO        2     2     0    25     6     3     1     0
    5        5 6/4/… COL   SJS   L       REG       2     0     0    30     4     5     0     0
    6        6 6/4/… SJS   COL   W       REG       5     0     0    30     4     2     0     0
    # … with 5 more variables: PPO <dbl>, PPOA <dbl>, SA <dbl>, OppPIM <dbl>, DIFF <dbl>

所以我现在要提取： 31 支球队 (Tm)、胜场数（结果）和净胜球（DIFF 之和），请提供进一步帮助？

Answer 1

您就快完成了，因为您已经在 "Time" 列中列出的那些值之间生成了一个分割图。如果你想绘制 "Time" 和 "Outcome" 列的所有排列，这意味着你需要 combine 这些值到一列并绘制相同的东西。这里有几个选项，但也许最简单的如下：

team_outcomes_regulation$outcome_time <-
    paste(team_outcomes_regulation$Outcome, "by", team_outcomes_regulation$Time)

那么你的剧情就变成了：

team_outcomes_regulation %>%
    ggplot(aes(x = Tm, y = prob, fill = outcome_time)) +
    geom_bar(position = "fill",stat = "identity") +
    theme(axis.text.x = element_text(angle = 90))

编辑：附带问题

So I Now want to Extract: the 31 Teams (Tm), Number of Wins (Outcome) and Goal Difference (sum of DIFF), some further assistance please?

为此，我正在创建一个类似于您自己的虚拟数据集，它应该可以帮助您想象可以采用的一种方法。不过有几种方法可以做到这一点——我这里有的是 "sort of clunky" 恕我直言。

# dummy data
df <- data.frame(
    Tm <- sample(LETTERS[1:5], 30, replace = TRUE),
    Outcome <- sample(c('W','L'), 30, replace = TRUE),
    Diff <- sample(1:3, 30, replace=TRUE),
    Time <- sample(c('REG', 'SO'), 30, replace=TRUE)
)

这为您提供了 5 支球队（"A" 到 "E"），它们具有随机结果、目标差异，我还添加了一个 "extra" 列来向您展示这也删除了那些列不需要。这里的方法是去除损失，然后汇总剩余的数据，按团队分组。注意：这意味着 Diff 的总和 仅基于获胜 而不是基于损失。如果您想包括损失，还有其他几种方法可以做到这一点。

df %>%
    group_by(Tm, Outcome) %>%
    summarize(Wins=n(), Goal.Diff=sum(Diff)) %>%
    dplyr::filter(Outcome=='W')

# A tibble: 4 x 4
# Groups:   Tm [4]
  Tm    Outcome  Wins Goal.Diff
  <fct> <fct>   <int>     <int>
1 A     W           5        10
2 B     W           3         7
3 C     W           4         9
4 D     W           1         2

这是一种方法 - 如果您还有其他相关问题，我建议您在 SO 上提出一个新问题。如果你愿意，你可以 link 把它转到这个问题，但这是一个单独的问题，所以应该单独提问。

如何使用带有 2 个变量和 3 个子变量的概率的 ggplot 创建条形图

How to create a Bar with ggplot with probability with 2 variables and 3 sub variables

ggplot2

geom-bar