通过结合两列进行子集化

subsetting by conjunction of two columns

我想同时在两列上创建我的数据调节子集。

类似于这里: subsetting data using multiple variables in R

例如:

假设我有这个名为 Gamedat:

的数据集
        Games    People Hoursplayed
    goldeneye   Michael           5
    goldeneye  Thatcher           8
    goldeneye    Dexter          12
    goldeneye    Dexter          15
       pacman    Dexter           2
       tetris     Clint           5
       tetris    Dexter           8
    goldeneye  Thatcher          12
       pacman  Thatcher          15
    goldeneye     Clint           2
       pacman   Michael           5
       pacman   Michael           8
       pacman     Clint          12
       tetris      John          15
       tetris     Clint           2
 ageofempires     Clint           5
       pacman    Dexter           8
 ageofempires  Thatcher          12
 ageofempires      John          15
    goldeneye    Dexter           2

说我想看像goldeneye这样的游戏。我想看看有多少玩家玩其他游戏的时间与他们玩黄金眼的时间相同(这在我的真实数据集中更有用)。

所以我这样做:

 Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]`

然后我这样做:

  subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] & 
  Gamedat[ ,3] %in% Gameofinterest[ ,3])

但这给了我:

       Games   People Hoursplayed
   goldeneye  Michael           5
   goldeneye Thatcher           8
   goldeneye   Dexter          12
   goldeneye   Dexter          15
      pacman   Dexter           2
      tetris    Clint           5
      tetris   Dexter           8
   goldeneye Thatcher          12
      pacman Thatcher          15
   goldeneye    Clint           2
      pacman  Michael           5
      pacman  Michael           8
      pacman    Clint          12
      tetris    Clint           2
ageofempires    Clint           5
      pacman   Dexter           8
ageofempires Thatcher          12
   goldeneye   Dexter           2

当我真正想要的是:

         Games   People Hoursplayed
     goldeneye  Michael           5
     goldeneye Thatcher           8
     goldeneye   Dexter          12
     goldeneye   Dexter          15
        pacman   Dexter           2
     goldeneye Thatcher          12
     goldeneye    Clint           2
        pacman  Michael           5
        tetris    Clint           2
  ageofempires Thatcher          12
     goldeneye   Dexter           2

简而言之,我想找到匹配 "People & Hoursplayed"、

的示例

而不是 "People" 和 "Hoursplayed"... 有意义吗?

我知道我能做到:

 Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="")

 Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ]

并得到:

        Games   People Hoursplayed    PHpaste
    goldeneye  Michael           5   Michael5
    goldeneye Thatcher           8  Thatcher8
    goldeneye   Dexter          12   Dexter12
    goldeneye   Dexter          15   Dexter15
       pacman   Dexter           2    Dexter2
    goldeneye Thatcher          12 Thatcher12
    goldeneye    Clint           2     Clint2
       pacman  Michael           5   Michael5
       tetris    Clint           2     Clint2
 ageofempires Thatcher          12 Thatcher12
    goldeneye   Dexter           2    Dexter2

想要更优雅的东西吗?

我认为这可以使用 dplyr 来实现。首先,使用过滤器检索游戏为 goldeneye 的行。然后使用 inner_join 使用 People 和 HoursPlayed 加入原始数据。可选:select 所需的列并按人员排列。

library(dplyr)
Gamedat %>% 
  filter(Games == "goldeneye") %>% 
  inner_join(Gamedat, by = c("People", "Hoursplayed")) %>% 
  select(Games = Games.y, People, Hoursplayed) %>% 
  arrange(People)

结果:

          Games   People Hoursplayed
1     goldeneye    Clint           2
2        tetris    Clint           2
3     goldeneye   Dexter          12
4     goldeneye   Dexter          15
5        pacman   Dexter           2
6     goldeneye   Dexter           2
7     goldeneye  Michael           5
8        pacman  Michael           5
9     goldeneye Thatcher           8
10    goldeneye Thatcher          12
11 ageofempires Thatcher          12