使用 ID 和匹配字符串合并两个数据框
Merge two data frames using ID and matching string
假设我有2个数据框:
df1 <- data.frame(eventId = c("6770583", "6770529"), home = c("Real Salt Lake", "Vancouver Whitecaps Fc"), away = c("New England Revolution", "Sporting Kansas City"))
df2 <- data.frame(eventId = c("6770583", "6770583", "6770529", "6770529"), currentOddType = c("New England Revolution to win 1-0, 2-0 or 2-1", "Real Salt Lake to win 1-0, 2-0 or 2-1", "Sporting Kansas City to win 1-0, 2-0 or 2-1", "Vancouver Whitecaps to win 1-0, 2-0 or 2-1"), currentOdds = c("7", "4", "4.33", "4.5"))
我想使用 eventId 和团队名称合并它们,因为 eventId 在 df2 中重复。
期望的结果如下所示:
dfFinal <- data.frame(eventId = c("6770583", "6770529"), home = c("Real Salt Lake", "Vancouver Whitecaps Fc"), away = c("New England Revolution", "Sporting Kansas City"), homeOdd = c("4", "4.5"), awayOdd = c("7", "4.33"))
dfFinal
此外,如果没有匹配项,homeOdd 和 awayOdd 将是 "NAs"
我们可以结合使用 gather
/spread
和 left_join
df1 %>%
gather(type, team, -eventId) %>%
left_join(
df2 %>%
separate(currentOddType, into = c("team", "type"), sep = "\s(?=to win)") %>%
select(eventId, team, currentOdds),
by = c("eventId", "team")) %>%
unite(val, team, currentOdds) %>%
spread(type, val) %>%
separate(away, into = c("away", "awayOdd"), sep = "_") %>%
separate(home, into = c("home", "homeOdd"), sep = "_")
# eventId away awayOdd home homeOdd
#1 6770529 Sporting Kansas City 4.33 Vancouver Whitecaps Fc NA
#2 6770583 New England Revolution 7 Real Salt Lake 4
请注意,Vancouver Whitecaps Fc
变为 NA
,因为 df1
和 df2
中的名称不同(Vancouver Whitecaps Fc
与 Vancouver Whitecaps
)。
假设我有2个数据框:
df1 <- data.frame(eventId = c("6770583", "6770529"), home = c("Real Salt Lake", "Vancouver Whitecaps Fc"), away = c("New England Revolution", "Sporting Kansas City"))
df2 <- data.frame(eventId = c("6770583", "6770583", "6770529", "6770529"), currentOddType = c("New England Revolution to win 1-0, 2-0 or 2-1", "Real Salt Lake to win 1-0, 2-0 or 2-1", "Sporting Kansas City to win 1-0, 2-0 or 2-1", "Vancouver Whitecaps to win 1-0, 2-0 or 2-1"), currentOdds = c("7", "4", "4.33", "4.5"))
我想使用 eventId 和团队名称合并它们,因为 eventId 在 df2 中重复。
期望的结果如下所示:
dfFinal <- data.frame(eventId = c("6770583", "6770529"), home = c("Real Salt Lake", "Vancouver Whitecaps Fc"), away = c("New England Revolution", "Sporting Kansas City"), homeOdd = c("4", "4.5"), awayOdd = c("7", "4.33"))
dfFinal
此外,如果没有匹配项,homeOdd 和 awayOdd 将是 "NAs"
我们可以结合使用 gather
/spread
和 left_join
df1 %>%
gather(type, team, -eventId) %>%
left_join(
df2 %>%
separate(currentOddType, into = c("team", "type"), sep = "\s(?=to win)") %>%
select(eventId, team, currentOdds),
by = c("eventId", "team")) %>%
unite(val, team, currentOdds) %>%
spread(type, val) %>%
separate(away, into = c("away", "awayOdd"), sep = "_") %>%
separate(home, into = c("home", "homeOdd"), sep = "_")
# eventId away awayOdd home homeOdd
#1 6770529 Sporting Kansas City 4.33 Vancouver Whitecaps Fc NA
#2 6770583 New England Revolution 7 Real Salt Lake 4
请注意,Vancouver Whitecaps Fc
变为 NA
,因为 df1
和 df2
中的名称不同(Vancouver Whitecaps Fc
与 Vancouver Whitecaps
)。