通过结合两列进行子集化
subsetting by conjunction of two columns
我想同时在两列上创建我的数据调节子集。
类似于这里:
subsetting data using multiple variables in R
例如:
假设我有这个名为 Gamedat
:
的数据集
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
tetris Clint 5
tetris Dexter 8
goldeneye Thatcher 12
pacman Thatcher 15
goldeneye Clint 2
pacman Michael 5
pacman Michael 8
pacman Clint 12
tetris John 15
tetris Clint 2
ageofempires Clint 5
pacman Dexter 8
ageofempires Thatcher 12
ageofempires John 15
goldeneye Dexter 2
说我想看像goldeneye这样的游戏。我想看看有多少玩家玩其他游戏的时间与他们玩黄金眼的时间相同(这在我的真实数据集中更有用)。
所以我这样做:
Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]`
然后我这样做:
subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] &
Gamedat[ ,3] %in% Gameofinterest[ ,3])
但这给了我:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
tetris Clint 5
tetris Dexter 8
goldeneye Thatcher 12
pacman Thatcher 15
goldeneye Clint 2
pacman Michael 5
pacman Michael 8
pacman Clint 12
tetris Clint 2
ageofempires Clint 5
pacman Dexter 8
ageofempires Thatcher 12
goldeneye Dexter 2
当我真正想要的是:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
goldeneye Thatcher 12
goldeneye Clint 2
pacman Michael 5
tetris Clint 2
ageofempires Thatcher 12
goldeneye Dexter 2
简而言之,我想找到匹配 "People & Hoursplayed"、
的示例
而不是 "People" 和 "Hoursplayed"... 有意义吗?
我知道我能做到:
Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="")
Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ]
并得到:
Games People Hoursplayed PHpaste
goldeneye Michael 5 Michael5
goldeneye Thatcher 8 Thatcher8
goldeneye Dexter 12 Dexter12
goldeneye Dexter 15 Dexter15
pacman Dexter 2 Dexter2
goldeneye Thatcher 12 Thatcher12
goldeneye Clint 2 Clint2
pacman Michael 5 Michael5
tetris Clint 2 Clint2
ageofempires Thatcher 12 Thatcher12
goldeneye Dexter 2 Dexter2
想要更优雅的东西吗?
我认为这可以使用 dplyr
来实现。首先,使用过滤器检索游戏为 goldeneye 的行。然后使用 inner_join
使用 People 和 HoursPlayed 加入原始数据。可选:select 所需的列并按人员排列。
library(dplyr)
Gamedat %>%
filter(Games == "goldeneye") %>%
inner_join(Gamedat, by = c("People", "Hoursplayed")) %>%
select(Games = Games.y, People, Hoursplayed) %>%
arrange(People)
结果:
Games People Hoursplayed
1 goldeneye Clint 2
2 tetris Clint 2
3 goldeneye Dexter 12
4 goldeneye Dexter 15
5 pacman Dexter 2
6 goldeneye Dexter 2
7 goldeneye Michael 5
8 pacman Michael 5
9 goldeneye Thatcher 8
10 goldeneye Thatcher 12
11 ageofempires Thatcher 12
我想同时在两列上创建我的数据调节子集。
类似于这里: subsetting data using multiple variables in R
例如:
假设我有这个名为 Gamedat
:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
tetris Clint 5
tetris Dexter 8
goldeneye Thatcher 12
pacman Thatcher 15
goldeneye Clint 2
pacman Michael 5
pacman Michael 8
pacman Clint 12
tetris John 15
tetris Clint 2
ageofempires Clint 5
pacman Dexter 8
ageofempires Thatcher 12
ageofempires John 15
goldeneye Dexter 2
说我想看像goldeneye这样的游戏。我想看看有多少玩家玩其他游戏的时间与他们玩黄金眼的时间相同(这在我的真实数据集中更有用)。
所以我这样做:
Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]`
然后我这样做:
subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] &
Gamedat[ ,3] %in% Gameofinterest[ ,3])
但这给了我:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
tetris Clint 5
tetris Dexter 8
goldeneye Thatcher 12
pacman Thatcher 15
goldeneye Clint 2
pacman Michael 5
pacman Michael 8
pacman Clint 12
tetris Clint 2
ageofempires Clint 5
pacman Dexter 8
ageofempires Thatcher 12
goldeneye Dexter 2
当我真正想要的是:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
goldeneye Thatcher 12
goldeneye Clint 2
pacman Michael 5
tetris Clint 2
ageofempires Thatcher 12
goldeneye Dexter 2
简而言之,我想找到匹配 "People & Hoursplayed"、
的示例而不是 "People" 和 "Hoursplayed"... 有意义吗?
我知道我能做到:
Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="")
Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ]
并得到:
Games People Hoursplayed PHpaste
goldeneye Michael 5 Michael5
goldeneye Thatcher 8 Thatcher8
goldeneye Dexter 12 Dexter12
goldeneye Dexter 15 Dexter15
pacman Dexter 2 Dexter2
goldeneye Thatcher 12 Thatcher12
goldeneye Clint 2 Clint2
pacman Michael 5 Michael5
tetris Clint 2 Clint2
ageofempires Thatcher 12 Thatcher12
goldeneye Dexter 2 Dexter2
想要更优雅的东西吗?
我认为这可以使用 dplyr
来实现。首先,使用过滤器检索游戏为 goldeneye 的行。然后使用 inner_join
使用 People 和 HoursPlayed 加入原始数据。可选:select 所需的列并按人员排列。
library(dplyr)
Gamedat %>%
filter(Games == "goldeneye") %>%
inner_join(Gamedat, by = c("People", "Hoursplayed")) %>%
select(Games = Games.y, People, Hoursplayed) %>%
arrange(People)
结果:
Games People Hoursplayed
1 goldeneye Clint 2
2 tetris Clint 2
3 goldeneye Dexter 12
4 goldeneye Dexter 15
5 pacman Dexter 2
6 goldeneye Dexter 2
7 goldeneye Michael 5
8 pacman Michael 5
9 goldeneye Thatcher 8
10 goldeneye Thatcher 12
11 ageofempires Thatcher 12