删除具有特定行顺序的观察
Delete observations with specific row order
我有一个来自游戏的基于 activity 的数据集,我想在游戏会话中聚合它。在某些情况下,我观察到游戏恢复后紧接着是游戏开始或游戏结束。由于这些不是有意义的游戏会话,我想从我的数据集中删除这些观察结果(每个游戏恢复后跟 a)游戏开始或 b)游戏结束。
简化的示例数据:
game_da = data.frame(activity = c("gamestart", "activity1", "activity2", "gameclose", "gameresume", "gameclose", "gameresume", "activity1", "gameclose"))
game_da
activity
1 gamestart
2 activity1
3 activity2
4 gameclose
5 gameresume
6 gameclose
7 gameresume
8 activity1
9 gameclose
预期输出:
game_da2 = data.frame(activity = c("gamestart", "activity1", "activity2", "gameclose", "gameresume", "activity1", "gameclose"))
game_da2
activity
1 gamestart
2 activity1
3 activity2
4 gameclose
5 gameresume
6 activity1
7 gameclose
我尝试过的方法(...但删除了我想要的更多观察结果):
game_da3 = mutate(game_da, help_var = case_when( activity == "gameresume" |
+ activity == "gamestart" |
+ activity == "gameclose" ~ 1, TRUE ~ 0),
+ lead_help_var = lead(help_var),
+ diff_help_var = help_var + lead_help_var) %>%
+ filter(diff_help_var != 2)
game_da3
activity help_var lead_help_var diff_help_var
1 gamestart 1 0 1
2 activity1 0 0 0
3 activity2 0 1 1
4 gameresume 1 0 1
5 activity1 0 1 1
您可以使用 lead
和 lag
来过滤行:
library(dplyr)
game_da %>%
filter(!(activity == 'gameresume' &
lead(activity, default = TRUE) %in% c('gameclose', 'gamestart') |
lag(activity, default = TRUE) == 'gameresume' &
activity %in% c('gameclose', 'gamestart')))
# activity
#1 gamestart
#2 activity1
#3 activity2
#4 gameclose
#5 gameresume
#6 activity1
#7 gameclose
在 data.table
和 shift
中使用类似的逻辑
library(data.table)
setDT(game_da)[! (activity == 'gameresume' &
shift(activity, type = 'lead', fill = TRUE) %in% c('gameclose', 'gamestart') |
shift(activity, fill = TRUE) == 'gameresume' &
activity %in% c('gameclose', 'gamestart'))]
我有一个来自游戏的基于 activity 的数据集,我想在游戏会话中聚合它。在某些情况下,我观察到游戏恢复后紧接着是游戏开始或游戏结束。由于这些不是有意义的游戏会话,我想从我的数据集中删除这些观察结果(每个游戏恢复后跟 a)游戏开始或 b)游戏结束。
简化的示例数据:
game_da = data.frame(activity = c("gamestart", "activity1", "activity2", "gameclose", "gameresume", "gameclose", "gameresume", "activity1", "gameclose"))
game_da
activity
1 gamestart
2 activity1
3 activity2
4 gameclose
5 gameresume
6 gameclose
7 gameresume
8 activity1
9 gameclose
预期输出:
game_da2 = data.frame(activity = c("gamestart", "activity1", "activity2", "gameclose", "gameresume", "activity1", "gameclose"))
game_da2
activity
1 gamestart
2 activity1
3 activity2
4 gameclose
5 gameresume
6 activity1
7 gameclose
我尝试过的方法(...但删除了我想要的更多观察结果):
game_da3 = mutate(game_da, help_var = case_when( activity == "gameresume" |
+ activity == "gamestart" |
+ activity == "gameclose" ~ 1, TRUE ~ 0),
+ lead_help_var = lead(help_var),
+ diff_help_var = help_var + lead_help_var) %>%
+ filter(diff_help_var != 2)
game_da3
activity help_var lead_help_var diff_help_var
1 gamestart 1 0 1
2 activity1 0 0 0
3 activity2 0 1 1
4 gameresume 1 0 1
5 activity1 0 1 1
您可以使用 lead
和 lag
来过滤行:
library(dplyr)
game_da %>%
filter(!(activity == 'gameresume' &
lead(activity, default = TRUE) %in% c('gameclose', 'gamestart') |
lag(activity, default = TRUE) == 'gameresume' &
activity %in% c('gameclose', 'gamestart')))
# activity
#1 gamestart
#2 activity1
#3 activity2
#4 gameclose
#5 gameresume
#6 activity1
#7 gameclose
在 data.table
和 shift
library(data.table)
setDT(game_da)[! (activity == 'gameresume' &
shift(activity, type = 'lead', fill = TRUE) %in% c('gameclose', 'gamestart') |
shift(activity, fill = TRUE) == 'gameresume' &
activity %in% c('gameclose', 'gamestart'))]