如何根据日期、团队和比赛在 R 中创建连胜(或 运行 形式)?
How to create a streak (or run of form) in R based on date, team and competition?
我有一个非常具体的问题,我相信我一直在努力回答,我想我会把它公开给很棒的社区。
我有一个如下所示的虚构数据集(它是超过 30,000 行的子集的数据集 - 所以想使这个解决方案尽可能可重现):
*Date,Home Team,Away Team,League,Home Goals,Away Goals
43389,Everton,Wolves,League,1,3
43364,Man C,Arsenal,League,3,1
43414,Everton,Man C,League,0,2
43385,Liverpool,Bournemouth,League,3,0
43397,Man C,Chelsea,League,6,0
43390,Liverpool,Watford,League,5,0
43381,Man C,West Ham,League,1,0
43392,Man C,Arsenal,League,3,1
43369,Everton,Man C,League,0,2
43375,Liverpool,Bournemouth,League,3,0
43382,Man C,Chelsea,League,6,0
43396,Liverpool,Watford,League,5,0
43373,Man C,West Ham,League,1,0*
在 R 中,我想附加到每一行的是一条连胜记录,表示每支球队根据日期连续赢得了多少场比赛(在每场比赛中),即按时间顺序连续赢了多少场比赛?理想情况下,他们将有四列 -> 主队在主场的连胜纪录、客队在客场的连胜纪录、主队的总连胜纪录(主场和客场的总和)和客队的总连胜纪录(主场和客场的总和)。虽然我确信一旦找到一个解决方案,其他解决方案将能够使用类似的代码重新创建。我觉得我可以在 Excel 中使用 Count 或 Sumif 来做到这一点,但是,我不确定如何在 R 中重现它,并希望它尽可能高效。
在此先感谢您的帮助!!
> dput(Data)
structure(list(Date = structure(c(3L, 10L, 12L, 2L, 7L, 13L,
4L, 8L, 9L, 4L, 11L, 1L, 5L, 6L), .Label = c("03/10/2018", "04/10/2018",
"04/11/2018", "09/10/2018", "10/11/2018", "13/09/2018", "16/09/2018",
"16/10/2018", "20/09/2018", "21/10/2018", "22/10/2018", "28/09/2018",
"30/09/2018"), class = "factor"), Home.Team = structure(c(1L,
3L, 1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 2L, 3L, 2L, 3L), .Label = c("Everton",
"Liverpool", "Man C"), class = "factor"), Away.Team = structure(c(7L,
1L, 4L, 2L, 3L, 5L, 6L, 7L, 1L, 4L, 2L, 3L, 5L, 6L), .Label = c("Arsenal",
"Bournemouth", "Chelsea", "Man C", "Watford", "West Ham", "Wolves"
), class = "factor"), Competition = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Cup", "League"
), class = "factor"), Home.Goals = c(1L, 3L, 0L, 3L, 6L, 5L,
1L, 1L, 3L, 0L, 3L, 6L, 5L, 1L), Away.Goals = c(3L, 1L, 2L, 0L,
0L, 0L, 0L, 3L, 1L, 2L, 0L, 0L, 0L, 0L)), .Names = c("Date",
"Home.Team", "Away.Team", "Competition", "Home.Goals", "Away.Goals"
), class = "data.frame", row.names = c(NA, -14L))
希望以下内容对您有所帮助。数据样本只有总是赢或总是输的球队,但 运行 它在整个数据集上也应该有效。
library(dplyr)
library(data.table)
df <- df %>%
mutate_if(is.factor, as.character) # turn factors into characters
# Date Home.Team Away.Team Competition Home.Goals Away.Goals
# 1 04/11/2018 Everton Wolves League 1 3
# 2 21/10/2018 Man C Arsenal League 3 1
# 3 28/09/2018 Everton Man C League 0 2
# 4 04/10/2018 Liverpool Bournemouth League 3 0
# 5 16/09/2018 Man C Chelsea League 6 0
# 6 30/09/2018 Liverpool Watford League 5 0
# 7 09/10/2018 Man C West Ham Cup 1 0
# 8 16/10/2018 Everton Wolves Cup 1 3
# 9 20/09/2018 Man C Arsenal League 3 1
# 10 09/10/2018 Everton Man C League 0 2
# 11 22/10/2018 Liverpool Bournemouth League 3 0
# 12 03/10/2018 Man C Chelsea Cup 6 0
# 13 10/11/2018 Liverpool Watford League 5 0
# 14 13/09/2018 Man C West Ham League 1 0
# now we stack the dataframe twice on top of each other...
df <- df %>% # first...
rename(Home.Team = Away.Team # ...with switched home and away columns
, Away.Team = Home.Team
, Home.Goals = Away.Goals
, Away.Goals = Home.Goals) %>%
bind_rows(df) # and second in the original form
# Date Away.Team Home.Team Competition Away.Goals Home.Goals
# 1 04/11/2018 Everton Wolves League 1 3
# 2 21/10/2018 Man C Arsenal League 3 1
# 3 28/09/2018 Everton Man C League 0 2
# 4 04/10/2018 Liverpool Bournemouth League 3 0
# 5 16/09/2018 Man C Chelsea League 6 0
# 6 30/09/2018 Liverpool Watford League 5 0
# 7 09/10/2018 Man C West Ham Cup 1 0
# 8 16/10/2018 Everton Wolves Cup 1 3
# 9 20/09/2018 Man C Arsenal League 3 1
# 10 09/10/2018 Everton Man C League 0 2
# 11 22/10/2018 Liverpool Bournemouth League 3 0
# 12 03/10/2018 Man C Chelsea Cup 6 0
# 13 10/11/2018 Liverpool Watford League 5 0
# 14 13/09/2018 Man C West Ham League 1 0
# 15 04/11/2018 Wolves Everton League 3 1
# 16 21/10/2018 Arsenal Man C League 1 3
# 17 28/09/2018 Man C Everton League 2 0
# 18 04/10/2018 Bournemouth Liverpool League 0 3
# 19 16/09/2018 Chelsea Man C League 0 6
# 20 30/09/2018 Watford Liverpool League 0 5
# 21 09/10/2018 West Ham Man C Cup 0 1
# 22 16/10/2018 Wolves Everton Cup 3 1
# 23 20/09/2018 Arsenal Man C League 1 3
# 24 09/10/2018 Man C Everton League 2 0
# 25 22/10/2018 Bournemouth Liverpool League 0 3
# 26 03/10/2018 Chelsea Man C Cup 0 6
# 27 10/11/2018 Watford Liverpool League 0 5
# 28 13/09/2018 West Ham Man C League 0 1
df %>%
mutate(won = Home.Goals < Away.Goals) %>% # now we calculate whether the match was won
arrange(Home.Team, Date) %>% # sort by home.team and then date
group_by(Home.Team) %>% # and for every home.team...
mutate(streakGroup = rleid(won)) %>% # ...we calculate an ID for all the streaks
group_by(Home.Team, streakGroup) %>% # and then for every home.team AND streak
mutate(gamesWonInStreak = cumsum(won)) # we calculate the cumulative sum (number of games won in a roll)
# Date Away.Team Home.Team Competition Away.Goals Home.Goals won streakGroup gamesWonInStreak
# <chr> <chr> <chr> <chr> <int> <int> <lgl> <int> <int>
# 1 20/09/2018 Man C Arsenal League 3 1 TRUE 1 1
# 2 21/10/2018 Man C Arsenal League 3 1 TRUE 1 2
# 3 04/10/2018 Liverpool Bournemouth League 3 0 TRUE 1 1
# 4 22/10/2018 Liverpool Bournemouth League 3 0 TRUE 1 2
# 5 03/10/2018 Man C Chelsea Cup 6 0 TRUE 1 1
# 6 16/09/2018 Man C Chelsea League 6 0 TRUE 1 2
# 7 04/11/2018 Wolves Everton League 3 1 TRUE 1 1
# 8 09/10/2018 Man C Everton League 2 0 TRUE 1 2
# 9 16/10/2018 Wolves Everton Cup 3 1 TRUE 1 3
# 10 28/09/2018 Man C Everton League 2 0 TRUE 1 4
# # ... with 18 more rows
我有一个非常具体的问题,我相信我一直在努力回答,我想我会把它公开给很棒的社区。
我有一个如下所示的虚构数据集(它是超过 30,000 行的子集的数据集 - 所以想使这个解决方案尽可能可重现):
*Date,Home Team,Away Team,League,Home Goals,Away Goals
43389,Everton,Wolves,League,1,3
43364,Man C,Arsenal,League,3,1
43414,Everton,Man C,League,0,2
43385,Liverpool,Bournemouth,League,3,0
43397,Man C,Chelsea,League,6,0
43390,Liverpool,Watford,League,5,0
43381,Man C,West Ham,League,1,0
43392,Man C,Arsenal,League,3,1
43369,Everton,Man C,League,0,2
43375,Liverpool,Bournemouth,League,3,0
43382,Man C,Chelsea,League,6,0
43396,Liverpool,Watford,League,5,0
43373,Man C,West Ham,League,1,0*
在 R 中,我想附加到每一行的是一条连胜记录,表示每支球队根据日期连续赢得了多少场比赛(在每场比赛中),即按时间顺序连续赢了多少场比赛?理想情况下,他们将有四列 -> 主队在主场的连胜纪录、客队在客场的连胜纪录、主队的总连胜纪录(主场和客场的总和)和客队的总连胜纪录(主场和客场的总和)。虽然我确信一旦找到一个解决方案,其他解决方案将能够使用类似的代码重新创建。我觉得我可以在 Excel 中使用 Count 或 Sumif 来做到这一点,但是,我不确定如何在 R 中重现它,并希望它尽可能高效。
在此先感谢您的帮助!!
> dput(Data)
structure(list(Date = structure(c(3L, 10L, 12L, 2L, 7L, 13L,
4L, 8L, 9L, 4L, 11L, 1L, 5L, 6L), .Label = c("03/10/2018", "04/10/2018",
"04/11/2018", "09/10/2018", "10/11/2018", "13/09/2018", "16/09/2018",
"16/10/2018", "20/09/2018", "21/10/2018", "22/10/2018", "28/09/2018",
"30/09/2018"), class = "factor"), Home.Team = structure(c(1L,
3L, 1L, 2L, 3L, 2L, 3L, 1L, 3L, 1L, 2L, 3L, 2L, 3L), .Label = c("Everton",
"Liverpool", "Man C"), class = "factor"), Away.Team = structure(c(7L,
1L, 4L, 2L, 3L, 5L, 6L, 7L, 1L, 4L, 2L, 3L, 5L, 6L), .Label = c("Arsenal",
"Bournemouth", "Chelsea", "Man C", "Watford", "West Ham", "Wolves"
), class = "factor"), Competition = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Cup", "League"
), class = "factor"), Home.Goals = c(1L, 3L, 0L, 3L, 6L, 5L,
1L, 1L, 3L, 0L, 3L, 6L, 5L, 1L), Away.Goals = c(3L, 1L, 2L, 0L,
0L, 0L, 0L, 3L, 1L, 2L, 0L, 0L, 0L, 0L)), .Names = c("Date",
"Home.Team", "Away.Team", "Competition", "Home.Goals", "Away.Goals"
), class = "data.frame", row.names = c(NA, -14L))
希望以下内容对您有所帮助。数据样本只有总是赢或总是输的球队,但 运行 它在整个数据集上也应该有效。
library(dplyr)
library(data.table)
df <- df %>%
mutate_if(is.factor, as.character) # turn factors into characters
# Date Home.Team Away.Team Competition Home.Goals Away.Goals
# 1 04/11/2018 Everton Wolves League 1 3
# 2 21/10/2018 Man C Arsenal League 3 1
# 3 28/09/2018 Everton Man C League 0 2
# 4 04/10/2018 Liverpool Bournemouth League 3 0
# 5 16/09/2018 Man C Chelsea League 6 0
# 6 30/09/2018 Liverpool Watford League 5 0
# 7 09/10/2018 Man C West Ham Cup 1 0
# 8 16/10/2018 Everton Wolves Cup 1 3
# 9 20/09/2018 Man C Arsenal League 3 1
# 10 09/10/2018 Everton Man C League 0 2
# 11 22/10/2018 Liverpool Bournemouth League 3 0
# 12 03/10/2018 Man C Chelsea Cup 6 0
# 13 10/11/2018 Liverpool Watford League 5 0
# 14 13/09/2018 Man C West Ham League 1 0
# now we stack the dataframe twice on top of each other...
df <- df %>% # first...
rename(Home.Team = Away.Team # ...with switched home and away columns
, Away.Team = Home.Team
, Home.Goals = Away.Goals
, Away.Goals = Home.Goals) %>%
bind_rows(df) # and second in the original form
# Date Away.Team Home.Team Competition Away.Goals Home.Goals
# 1 04/11/2018 Everton Wolves League 1 3
# 2 21/10/2018 Man C Arsenal League 3 1
# 3 28/09/2018 Everton Man C League 0 2
# 4 04/10/2018 Liverpool Bournemouth League 3 0
# 5 16/09/2018 Man C Chelsea League 6 0
# 6 30/09/2018 Liverpool Watford League 5 0
# 7 09/10/2018 Man C West Ham Cup 1 0
# 8 16/10/2018 Everton Wolves Cup 1 3
# 9 20/09/2018 Man C Arsenal League 3 1
# 10 09/10/2018 Everton Man C League 0 2
# 11 22/10/2018 Liverpool Bournemouth League 3 0
# 12 03/10/2018 Man C Chelsea Cup 6 0
# 13 10/11/2018 Liverpool Watford League 5 0
# 14 13/09/2018 Man C West Ham League 1 0
# 15 04/11/2018 Wolves Everton League 3 1
# 16 21/10/2018 Arsenal Man C League 1 3
# 17 28/09/2018 Man C Everton League 2 0
# 18 04/10/2018 Bournemouth Liverpool League 0 3
# 19 16/09/2018 Chelsea Man C League 0 6
# 20 30/09/2018 Watford Liverpool League 0 5
# 21 09/10/2018 West Ham Man C Cup 0 1
# 22 16/10/2018 Wolves Everton Cup 3 1
# 23 20/09/2018 Arsenal Man C League 1 3
# 24 09/10/2018 Man C Everton League 2 0
# 25 22/10/2018 Bournemouth Liverpool League 0 3
# 26 03/10/2018 Chelsea Man C Cup 0 6
# 27 10/11/2018 Watford Liverpool League 0 5
# 28 13/09/2018 West Ham Man C League 0 1
df %>%
mutate(won = Home.Goals < Away.Goals) %>% # now we calculate whether the match was won
arrange(Home.Team, Date) %>% # sort by home.team and then date
group_by(Home.Team) %>% # and for every home.team...
mutate(streakGroup = rleid(won)) %>% # ...we calculate an ID for all the streaks
group_by(Home.Team, streakGroup) %>% # and then for every home.team AND streak
mutate(gamesWonInStreak = cumsum(won)) # we calculate the cumulative sum (number of games won in a roll)
# Date Away.Team Home.Team Competition Away.Goals Home.Goals won streakGroup gamesWonInStreak
# <chr> <chr> <chr> <chr> <int> <int> <lgl> <int> <int>
# 1 20/09/2018 Man C Arsenal League 3 1 TRUE 1 1
# 2 21/10/2018 Man C Arsenal League 3 1 TRUE 1 2
# 3 04/10/2018 Liverpool Bournemouth League 3 0 TRUE 1 1
# 4 22/10/2018 Liverpool Bournemouth League 3 0 TRUE 1 2
# 5 03/10/2018 Man C Chelsea Cup 6 0 TRUE 1 1
# 6 16/09/2018 Man C Chelsea League 6 0 TRUE 1 2
# 7 04/11/2018 Wolves Everton League 3 1 TRUE 1 1
# 8 09/10/2018 Man C Everton League 2 0 TRUE 1 2
# 9 16/10/2018 Wolves Everton Cup 3 1 TRUE 1 3
# 10 28/09/2018 Man C Everton League 2 0 TRUE 1 4
# # ... with 18 more rows