r + dplyr 过滤掉时间序列

r + dplyr filtering out time series

我有一些数据可以观察一组人以及他们随时间吃的水果。我想使用 dplyr 查看每个人,直到他们吃一根香蕉,并总结他们吃的所有水果 ,直到他们吃第一根香蕉

数据:

data <-  structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
    1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 9584L, 9584L, 9584L, 
    9584L, 9584L, 9584L, 9584L, 9584L, 9584L, 4758L, 4758L, 4758L, 
    4758L, 4758L, 4758L), site = structure(c(1L, 6L, 1L, 1L, 6L, 
    5L, 5L, 3L, 4L, 1L, 2L, 6L, 1L, 6L, 5L, 5L, 3L, 2L, 6L, 6L, 6L, 
    4L, 2L, 5L, 5L, 4L, 2L), .Label = c("apple", "banana", "lemon", 
    "lime", "orange", "pear"), class = "factor"), time = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 5L, 6L, 7L, 8L, 9L, 10L), int = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("banana", 
    "other"), class = "factor")), .Names = c("user", "site", "time", 
    "int"), row.names = c(NA, -27L), class = "data.frame")

我最初的想法是对数据进行分组以找到每个用户吃香蕉的第一个实例:

data <- data %>% transform(var = ifelse(site=="banana", 'banana','other'))

data_ban <- data %>% 
    filter(var=='banana') %>% 
    group_by(user, var, time) %>%
    group_by(user) %>%
    summarise(first_banana = min(time))

但现在我陷入了如何将其实际应用回原始 "data" 数据框的问题,并设置了一个过滤器:对于每个用户,仅包括 [=] 中给出的时间之前的数据23=]。有任何想法吗?

类似这样:按 user 分组并过滤 time 低于他们第一次吃香蕉的次数。

> data %>% group_by(user) %>% filter( time <= which(site=="banana")[1] )
Source: local data frame [17 x 4]
Groups: user

   user   site time    int
1  1234  apple    1  other
2  1234   pear    2  other
3  1234  apple    3  other
4  1234  apple    4  other
5  1234   pear    5  other
6  1234 orange    6  other
7  1234 orange    7  other
8  1234  lemon    8  other
9  1234   lime    9  other
10 1234  apple   10  other
11 1234 banana   11 banana
12 9584  apple    1  other
13 9584   pear    2  other
14 9584 orange    3  other
15 9584 orange    4  other
16 9584  lemon    5  other
17 9584 banana    6 banana

否则可能 anti_join

你可以试试slice

data %>%
     group_by(user) %>% 
     slice(1:(which(int=='banana')[1L]))