如何根据日期、团队和比赛在 R 中创建连胜(或 运行 形式)?

How to create a streak (or run of form) in R based on date, team and competition?

我有一个非常具体的问题,我相信我一直在努力回答,我想我会把它公开给很棒的社区。

我有一个如下所示的虚构数据集(它是超过 30,000 行的子集的数据集 - 所以想使这个解决方案尽可能可重现):

*Date,Home Team,Away Team,League,Home Goals,Away Goals
43389,Everton,Wolves,League,1,3
43364,Man C,Arsenal,League,3,1
43414,Everton,Man C,League,0,2
43385,Liverpool,Bournemouth,League,3,0
43397,Man C,Chelsea,League,6,0
43390,Liverpool,Watford,League,5,0
43381,Man C,West Ham,League,1,0
43392,Man C,Arsenal,League,3,1
43369,Everton,Man C,League,0,2
43375,Liverpool,Bournemouth,League,3,0
43382,Man C,Chelsea,League,6,0
43396,Liverpool,Watford,League,5,0
43373,Man C,West Ham,League,1,0*

在 R 中,我想附加到每一行的是一条连胜记录,表示每支球队根据日期连续赢得了多少场比赛(在每场比赛中),即按时间顺序连续赢了多少场比赛?理想情况下,他们将有四列 -> 主队在主场的连胜纪录、客队在客场的连胜纪录、主队的总连胜纪录(主场和客场的总和)和客队的总连胜纪录(主场和客场的总和)。虽然我确信一旦找到一个解决方案,其他解决方案将能够使用类似的代码重新创建。我觉得我可以在 Excel 中使用 Count 或 Sumif 来做到这一点,但是,我不确定如何在 R 中重现它,并希望它尽可能高效。

在此先感谢您的帮助!!

> dput(Data)
structure(list(Date = structure(c(3L, 10L, 12L, 2L, 7L, 13L, 
4L, 8L, 9L, 4L, 11L, 1L, 5L, 6L), .Label = c("03/10/2018", "04/10/2018", 
"04/11/2018", "09/10/2018", "10/11/2018", "13/09/2018", "16/09/2018", 
"16/10/2018", "20/09/2018", "21/10/2018", "22/10/2018", "28/09/2018", 
"30/09/2018"), class = "factor"), Home.Team = structure(c(1L, 
3L, 1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 2L, 3L, 2L, 3L), .Label = c("Everton", 
"Liverpool", "Man C"), class = "factor"), Away.Team = structure(c(7L, 
1L, 4L, 2L, 3L, 5L, 6L, 7L, 1L, 4L, 2L, 3L, 5L, 6L), .Label = c("Arsenal", 
"Bournemouth", "Chelsea", "Man C", "Watford", "West Ham", "Wolves"
), class = "factor"), Competition = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Cup", "League"
), class = "factor"), Home.Goals = c(1L, 3L, 0L, 3L, 6L, 5L, 
1L, 1L, 3L, 0L, 3L, 6L, 5L, 1L), Away.Goals = c(3L, 1L, 2L, 0L, 
0L, 0L, 0L, 3L, 1L, 2L, 0L, 0L, 0L, 0L)), .Names = c("Date", 
"Home.Team", "Away.Team", "Competition", "Home.Goals", "Away.Goals"
), class = "data.frame", row.names = c(NA, -14L))

希望以下内容对您有所帮助。数据样本只有总是赢或总是输的球队,但 运行 它在整个数据集上也应该有效。

library(dplyr)
library(data.table)

df <- df %>%
    mutate_if(is.factor, as.character) # turn factors into characters

#          Date Home.Team   Away.Team Competition Home.Goals Away.Goals
# 1  04/11/2018   Everton      Wolves      League          1          3
# 2  21/10/2018     Man C     Arsenal      League          3          1
# 3  28/09/2018   Everton       Man C      League          0          2
# 4  04/10/2018 Liverpool Bournemouth      League          3          0
# 5  16/09/2018     Man C     Chelsea      League          6          0
# 6  30/09/2018 Liverpool     Watford      League          5          0
# 7  09/10/2018     Man C    West Ham         Cup          1          0
# 8  16/10/2018   Everton      Wolves         Cup          1          3
# 9  20/09/2018     Man C     Arsenal      League          3          1
# 10 09/10/2018   Everton       Man C      League          0          2
# 11 22/10/2018 Liverpool Bournemouth      League          3          0
# 12 03/10/2018     Man C     Chelsea         Cup          6          0
# 13 10/11/2018 Liverpool     Watford      League          5          0
# 14 13/09/2018     Man C    West Ham      League          1          0

# now we stack the dataframe twice on top of each other...
df <- df %>% # first...
    rename(Home.Team = Away.Team # ...with switched home and away columns
           , Away.Team = Home.Team
           , Home.Goals = Away.Goals
           , Away.Goals = Home.Goals) %>%
    bind_rows(df) # and second in the original form

#          Date   Away.Team   Home.Team Competition Away.Goals Home.Goals
# 1  04/11/2018     Everton      Wolves      League          1          3
# 2  21/10/2018       Man C     Arsenal      League          3          1
# 3  28/09/2018     Everton       Man C      League          0          2
# 4  04/10/2018   Liverpool Bournemouth      League          3          0
# 5  16/09/2018       Man C     Chelsea      League          6          0
# 6  30/09/2018   Liverpool     Watford      League          5          0
# 7  09/10/2018       Man C    West Ham         Cup          1          0
# 8  16/10/2018     Everton      Wolves         Cup          1          3
# 9  20/09/2018       Man C     Arsenal      League          3          1
# 10 09/10/2018     Everton       Man C      League          0          2
# 11 22/10/2018   Liverpool Bournemouth      League          3          0
# 12 03/10/2018       Man C     Chelsea         Cup          6          0
# 13 10/11/2018   Liverpool     Watford      League          5          0
# 14 13/09/2018       Man C    West Ham      League          1          0
# 15 04/11/2018      Wolves     Everton      League          3          1
# 16 21/10/2018     Arsenal       Man C      League          1          3
# 17 28/09/2018       Man C     Everton      League          2          0
# 18 04/10/2018 Bournemouth   Liverpool      League          0          3
# 19 16/09/2018     Chelsea       Man C      League          0          6
# 20 30/09/2018     Watford   Liverpool      League          0          5
# 21 09/10/2018    West Ham       Man C         Cup          0          1
# 22 16/10/2018      Wolves     Everton         Cup          3          1
# 23 20/09/2018     Arsenal       Man C      League          1          3
# 24 09/10/2018       Man C     Everton      League          2          0
# 25 22/10/2018 Bournemouth   Liverpool      League          0          3
# 26 03/10/2018     Chelsea       Man C         Cup          0          6
# 27 10/11/2018     Watford   Liverpool      League          0          5
# 28 13/09/2018    West Ham       Man C      League          0          1

df %>%
    mutate(won = Home.Goals < Away.Goals) %>% # now we calculate whether the match was won
    arrange(Home.Team, Date) %>% # sort by home.team and then date
    group_by(Home.Team) %>% # and for every home.team...
    mutate(streakGroup = rleid(won)) %>% # ...we calculate an ID for all the streaks
    group_by(Home.Team, streakGroup) %>% # and then for every home.team AND streak
    mutate(gamesWonInStreak = cumsum(won)) # we calculate the cumulative sum (number of games won in a roll)

#    Date       Away.Team Home.Team   Competition Away.Goals Home.Goals won   streakGroup gamesWonInStreak
#    <chr>      <chr>     <chr>       <chr>            <int>      <int> <lgl>       <int>            <int>
# 1  20/09/2018 Man C     Arsenal     League               3          1 TRUE            1                1
# 2  21/10/2018 Man C     Arsenal     League               3          1 TRUE            1                2
# 3  04/10/2018 Liverpool Bournemouth League               3          0 TRUE            1                1
# 4  22/10/2018 Liverpool Bournemouth League               3          0 TRUE            1                2
# 5  03/10/2018 Man C     Chelsea     Cup                  6          0 TRUE            1                1
# 6  16/09/2018 Man C     Chelsea     League               6          0 TRUE            1                2
# 7  04/11/2018 Wolves    Everton     League               3          1 TRUE            1                1
# 8  09/10/2018 Man C     Everton     League               2          0 TRUE            1                2
# 9  16/10/2018 Wolves    Everton     Cup                  3          1 TRUE            1                3
# 10 28/09/2018 Man C     Everton     League               2          0 TRUE            1                4
# # ... with 18 more rows