将字符串向量转换为 R 中的数据帧
Convert string vector to dataframe in R
我正在进行一个快速抓取项目,该项目涉及抓取历史 NFL 足球数据。快速浏览一下我的数据:
allgames_thisweek = c("Chicago Bears 21, Tampa Bay Buccaneers 9 -- Box Score", "Cleveland Browns 28, Cincinnati Bengals 20 -- Box Score",
"Dallas Cowboys 26, Pittsburgh Steelers 9 -- Box Score", "Detroit Lions 31, Atlanta Falcons 28 (OT) -- Box Score",
"Green Bay Packers 16, Minnesota Vikings 10 -- Box Score", "Indianapolis Colts 45, Houston Oilers 21 -- Box Score",
"Kansas City Chiefs 30, New Orleans Saints 17 -- Box Score",
"Los Angeles Rams 14, Arizona Cardinals 12 -- Box Score", "Miami Dolphins 39, New England Patriots 35 -- Box Score",
"New York Giants 28, Philadelphia Eagles 23 -- Box Score", "New York Jets 23, Buffalo Bills 3 -- Box Score",
"San Diego Chargers 37, Denver Broncos 34 -- Box Score", "San Francisco 49ers 44, Los Angeles Raiders 14 -- Box Score",
"Seattle Seahawks 28, Washington Redskins 7 -- Box Score")
allgames_thisweek[1]
"Chicago Bears 21, Tampa Bay Buccaneers 9 -- Box Score"
每一行有以下数据[team1, team1score, team2, team2score, --, Box Score]
我的数据的格式完全相同,这意味着第一支球队的得分后总是有一个逗号,而第二支球队的得分后总是有一个 - 。我想创建一个包含 4 列(team1、team1score、team2、team2score)的数据框,因此输出可能如下所示:
output_df
team1 team1score team2 team2score
1. Chicago Bears 21 Tampba Bay Buccaneers 9
关于如何实现这一点有什么想法吗?任何帮助表示赞赏!谢谢
您可以使用 dplyr
+ stringr
:
library(dplyr)
library(stringr)
string %>%
str_replace("(?<=\d)\s.*--.+$", "") %>%
str_replace_all("\s(?=\d+\b)", ",") %>%
strsplit(",") %>%
do.call(rbind, .) %>%
data.frame() %>%
setNames(c("team1", "team1score", "team2", "team2score"))
结果:
team1 team1score team2 team2score
1 Chicago Bears 21 Tampa Bay Buccaneers 9
2 Cleveland Browns 28 Cincinnati Bengals 20
3 Dallas Cowboys 26 Pittsburgh Steelers 9
4 Detroit Lions 31 Atlanta Falcons 28
5 Green Bay Packers 16 Minnesota Vikings 10
6 Indianapolis Colts 45 Houston Oilers 21
7 Kansas City Chiefs 30 New Orleans Saints 17
8 Los Angeles Rams 14 Arizona Cardinals 12
9 Miami Dolphins 39 New England Patriots 35
10 New York Giants 28 Philadelphia Eagles 23
11 New York Jets 23 Buffalo Bills 3
12 San Diego Chargers 37 Denver Broncos 34
13 San Francisco 49ers 44 Los Angeles Raiders 14
14 Seattle Seahawks 28 Washington Redskins 7
备注:
(?<=\d)\s.*--.+$
匹配 space (\s
) 后跟任意字符 零次或多次 (.*
),文字 --
、任何字符 一次或多次 (.+
),并结束字符串 ($
)。此模式有一个额外条件,即它必须跟在数字 (?<=\d)
之后。
(?<=...)
称为正后视,它检查 在 之后的内容是否紧跟 ...
. 中的模式
\s(?=\d+\b)
匹配紧跟在 ((?=...)
) 数字 一次或多次 之后的 space 和单词边界 (\b
).所以这与球队名称和球队得分之间的 space 匹配。
(?=...)
是一个积极的前瞻,它检查 在 之前的内容是否紧跟 ...
中的模式。
数据:
string = c("Chicago Bears 21, Tampa Bay Buccaneers 9 -- Box Score", "Cleveland Browns 28, Cincinnati Bengals 20 -- Box Score",
"Dallas Cowboys 26, Pittsburgh Steelers 9 -- Box Score", "Detroit Lions 31, Atlanta Falcons 28 (OT) -- Box Score",
"Green Bay Packers 16, Minnesota Vikings 10 -- Box Score", "Indianapolis Colts 45, Houston Oilers 21 -- Box Score",
"Kansas City Chiefs 30, New Orleans Saints 17 -- Box Score",
"Los Angeles Rams 14, Arizona Cardinals 12 -- Box Score", "Miami Dolphins 39, New England Patriots 35 -- Box Score",
"New York Giants 28, Philadelphia Eagles 23 -- Box Score", "New York Jets 23, Buffalo Bills 3 -- Box Score",
"San Diego Chargers 37, Denver Broncos 34 -- Box Score", "San Francisco 49ers 44, Los Angeles Raiders 14 -- Box Score",
"Seattle Seahawks 28, Washington Redskins 7 -- Box Score")
我正在进行一个快速抓取项目,该项目涉及抓取历史 NFL 足球数据。快速浏览一下我的数据:
allgames_thisweek = c("Chicago Bears 21, Tampa Bay Buccaneers 9 -- Box Score", "Cleveland Browns 28, Cincinnati Bengals 20 -- Box Score",
"Dallas Cowboys 26, Pittsburgh Steelers 9 -- Box Score", "Detroit Lions 31, Atlanta Falcons 28 (OT) -- Box Score",
"Green Bay Packers 16, Minnesota Vikings 10 -- Box Score", "Indianapolis Colts 45, Houston Oilers 21 -- Box Score",
"Kansas City Chiefs 30, New Orleans Saints 17 -- Box Score",
"Los Angeles Rams 14, Arizona Cardinals 12 -- Box Score", "Miami Dolphins 39, New England Patriots 35 -- Box Score",
"New York Giants 28, Philadelphia Eagles 23 -- Box Score", "New York Jets 23, Buffalo Bills 3 -- Box Score",
"San Diego Chargers 37, Denver Broncos 34 -- Box Score", "San Francisco 49ers 44, Los Angeles Raiders 14 -- Box Score",
"Seattle Seahawks 28, Washington Redskins 7 -- Box Score")
allgames_thisweek[1]
"Chicago Bears 21, Tampa Bay Buccaneers 9 -- Box Score"
每一行有以下数据[team1, team1score, team2, team2score, --, Box Score]
我的数据的格式完全相同,这意味着第一支球队的得分后总是有一个逗号,而第二支球队的得分后总是有一个 - 。我想创建一个包含 4 列(team1、team1score、team2、team2score)的数据框,因此输出可能如下所示:
output_df
team1 team1score team2 team2score
1. Chicago Bears 21 Tampba Bay Buccaneers 9
关于如何实现这一点有什么想法吗?任何帮助表示赞赏!谢谢
您可以使用 dplyr
+ stringr
:
library(dplyr)
library(stringr)
string %>%
str_replace("(?<=\d)\s.*--.+$", "") %>%
str_replace_all("\s(?=\d+\b)", ",") %>%
strsplit(",") %>%
do.call(rbind, .) %>%
data.frame() %>%
setNames(c("team1", "team1score", "team2", "team2score"))
结果:
team1 team1score team2 team2score
1 Chicago Bears 21 Tampa Bay Buccaneers 9
2 Cleveland Browns 28 Cincinnati Bengals 20
3 Dallas Cowboys 26 Pittsburgh Steelers 9
4 Detroit Lions 31 Atlanta Falcons 28
5 Green Bay Packers 16 Minnesota Vikings 10
6 Indianapolis Colts 45 Houston Oilers 21
7 Kansas City Chiefs 30 New Orleans Saints 17
8 Los Angeles Rams 14 Arizona Cardinals 12
9 Miami Dolphins 39 New England Patriots 35
10 New York Giants 28 Philadelphia Eagles 23
11 New York Jets 23 Buffalo Bills 3
12 San Diego Chargers 37 Denver Broncos 34
13 San Francisco 49ers 44 Los Angeles Raiders 14
14 Seattle Seahawks 28 Washington Redskins 7
备注:
(?<=\d)\s.*--.+$
匹配 space (\s
) 后跟任意字符 零次或多次 (.*
),文字--
、任何字符 一次或多次 (.+
),并结束字符串 ($
)。此模式有一个额外条件,即它必须跟在数字(?<=\d)
之后。(?<=...)
称为正后视,它检查 在 之后的内容是否紧跟...
. 中的模式
\s(?=\d+\b)
匹配紧跟在 ((?=...)
) 数字 一次或多次 之后的 space 和单词边界 (\b
).所以这与球队名称和球队得分之间的 space 匹配。(?=...)
是一个积极的前瞻,它检查 在 之前的内容是否紧跟...
中的模式。
数据:
string = c("Chicago Bears 21, Tampa Bay Buccaneers 9 -- Box Score", "Cleveland Browns 28, Cincinnati Bengals 20 -- Box Score",
"Dallas Cowboys 26, Pittsburgh Steelers 9 -- Box Score", "Detroit Lions 31, Atlanta Falcons 28 (OT) -- Box Score",
"Green Bay Packers 16, Minnesota Vikings 10 -- Box Score", "Indianapolis Colts 45, Houston Oilers 21 -- Box Score",
"Kansas City Chiefs 30, New Orleans Saints 17 -- Box Score",
"Los Angeles Rams 14, Arizona Cardinals 12 -- Box Score", "Miami Dolphins 39, New England Patriots 35 -- Box Score",
"New York Giants 28, Philadelphia Eagles 23 -- Box Score", "New York Jets 23, Buffalo Bills 3 -- Box Score",
"San Diego Chargers 37, Denver Broncos 34 -- Box Score", "San Francisco 49ers 44, Los Angeles Raiders 14 -- Box Score",
"Seattle Seahawks 28, Washington Redskins 7 -- Box Score")