根据个别日期范围计算观察结果

Counting observations based on individual date ranges

我想做的是,对于每个样本位置(“日期”)的每个人(“id”),将过去 1 年内的唯一观察值的数量计算为一个新列(“n_sample_1y");

因此我想实现这样的输出;

# A tibble: 6 x 3
     id date                n_sample_1y
  <dbl> <dttm>                    <dbl>
1     3 2010-01-10 00:00:00           1
2     3 2010-02-15 00:00:00           2
3     3 2010-03-29 00:00:00           3
4     3 2010-03-29 00:00:00           3
5     3 2011-02-16 00:00:00           2
6     3 2011-06-13 00:00:00           2 

我一直在使用 lubridate 包来计算日期范围的开始日期(“s_date”)

mutate(s_date= date - lubridate::years(1), sample_no = match(date, unique(date)))

但我似乎无法再进步了。

任何 tips/ideas 将不胜感激。 数据样本:

df <- structure(list(id = c(3, 3, 3, 3, 3, 4, 4, 4, 5, 5), 
               date = structure(c(1220572800, 1221004800, 1269820800, 1269820800, 1274227200, 1276387200, 1279756800, 1283904000, 1286668800, 1289779200), 
               tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

nc <- ncol(df) + 1
final_df <- data.frame()

for(j in unique(df$id)){
  df1 <- df %>% filter(df$id == j)
  
  for(i in 1:nrow(df1)){
    
      df1[i, nc] <- length(unique(intersect(df1$date[df1$date <= df1$date[i]], df1$date[df1$date >= df1$date[i] %m-% years(1)])))
      
  }
  
  final_df <- rbind(final_df, df1)
}

使用lubridate()、dplyr()制作循环

首先安装并加载 lubridate 和 dplyr 库。
如果我正确理解了您的问题,那么这里是示例数据和该问题的相应解决方案。

df <- structure(list(id = c(3, 3, 3, 3, 3, 4, 4, 4, 5, 5), 
               date = structure(c(1220572800, 1221004800, 1269820800, 1269820800, 1274227200, 1276387200, 1279756800, 1283904000, 1286668800, 1289779200), 
               tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

nc <- ncol(df) + 1
final_df <- data.frame()

for(j in unique(df$id)){
  df1 <- df %>% filter(df$id == j)
  
  for(i in 1:nrow(df1)){
    
      df1[i, nc] <- length(unique(intersect(df1$date[df1$date <= df1$date[i]], df1$date[df1$date >= df1$date[i] %m-% years(1)])))
      
  }
  
  final_df <- rbind(final_df, df1)
}

希望更新后的代码对您有所帮助。编码愉快