根据个别日期范围计算观察结果
Counting observations based on individual date ranges
我想做的是,对于每个样本位置(“日期”)的每个人(“id”),将过去 1 年内的唯一观察值的数量计算为一个新列(“n_sample_1y");
因此我想实现这样的输出;
# A tibble: 6 x 3
id date n_sample_1y
<dbl> <dttm> <dbl>
1 3 2010-01-10 00:00:00 1
2 3 2010-02-15 00:00:00 2
3 3 2010-03-29 00:00:00 3
4 3 2010-03-29 00:00:00 3
5 3 2011-02-16 00:00:00 2
6 3 2011-06-13 00:00:00 2
我一直在使用 lubridate 包来计算日期范围的开始日期(“s_date”)
mutate(s_date= date - lubridate::years(1), sample_no = match(date, unique(date)))
但我似乎无法再进步了。
任何 tips/ideas 将不胜感激。
数据样本:
df <- structure(list(id = c(3, 3, 3, 3, 3, 4, 4, 4, 5, 5),
date = structure(c(1220572800, 1221004800, 1269820800, 1269820800, 1274227200, 1276387200, 1279756800, 1283904000, 1286668800, 1289779200),
tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
nc <- ncol(df) + 1
final_df <- data.frame()
for(j in unique(df$id)){
df1 <- df %>% filter(df$id == j)
for(i in 1:nrow(df1)){
df1[i, nc] <- length(unique(intersect(df1$date[df1$date <= df1$date[i]], df1$date[df1$date >= df1$date[i] %m-% years(1)])))
}
final_df <- rbind(final_df, df1)
}
使用lubridate()、dplyr()和制作循环
首先安装并加载 lubridate 和 dplyr 库。
如果我正确理解了您的问题,那么这里是示例数据和该问题的相应解决方案。
df <- structure(list(id = c(3, 3, 3, 3, 3, 4, 4, 4, 5, 5),
date = structure(c(1220572800, 1221004800, 1269820800, 1269820800, 1274227200, 1276387200, 1279756800, 1283904000, 1286668800, 1289779200),
tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
nc <- ncol(df) + 1
final_df <- data.frame()
for(j in unique(df$id)){
df1 <- df %>% filter(df$id == j)
for(i in 1:nrow(df1)){
df1[i, nc] <- length(unique(intersect(df1$date[df1$date <= df1$date[i]], df1$date[df1$date >= df1$date[i] %m-% years(1)])))
}
final_df <- rbind(final_df, df1)
}
希望更新后的代码对您有所帮助。编码愉快
我想做的是,对于每个样本位置(“日期”)的每个人(“id”),将过去 1 年内的唯一观察值的数量计算为一个新列(“n_sample_1y");
因此我想实现这样的输出;
# A tibble: 6 x 3
id date n_sample_1y
<dbl> <dttm> <dbl>
1 3 2010-01-10 00:00:00 1
2 3 2010-02-15 00:00:00 2
3 3 2010-03-29 00:00:00 3
4 3 2010-03-29 00:00:00 3
5 3 2011-02-16 00:00:00 2
6 3 2011-06-13 00:00:00 2
我一直在使用 lubridate 包来计算日期范围的开始日期(“s_date”)
mutate(s_date= date - lubridate::years(1), sample_no = match(date, unique(date)))
但我似乎无法再进步了。
任何 tips/ideas 将不胜感激。 数据样本:
df <- structure(list(id = c(3, 3, 3, 3, 3, 4, 4, 4, 5, 5),
date = structure(c(1220572800, 1221004800, 1269820800, 1269820800, 1274227200, 1276387200, 1279756800, 1283904000, 1286668800, 1289779200),
tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
nc <- ncol(df) + 1
final_df <- data.frame()
for(j in unique(df$id)){
df1 <- df %>% filter(df$id == j)
for(i in 1:nrow(df1)){
df1[i, nc] <- length(unique(intersect(df1$date[df1$date <= df1$date[i]], df1$date[df1$date >= df1$date[i] %m-% years(1)])))
}
final_df <- rbind(final_df, df1)
}
使用lubridate()、dplyr()和制作循环
首先安装并加载 lubridate 和 dplyr 库。
如果我正确理解了您的问题,那么这里是示例数据和该问题的相应解决方案。
df <- structure(list(id = c(3, 3, 3, 3, 3, 4, 4, 4, 5, 5),
date = structure(c(1220572800, 1221004800, 1269820800, 1269820800, 1274227200, 1276387200, 1279756800, 1283904000, 1286668800, 1289779200),
tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
nc <- ncol(df) + 1
final_df <- data.frame()
for(j in unique(df$id)){
df1 <- df %>% filter(df$id == j)
for(i in 1:nrow(df1)){
df1[i, nc] <- length(unique(intersect(df1$date[df1$date <= df1$date[i]], df1$date[df1$date >= df1$date[i] %m-% years(1)])))
}
final_df <- rbind(final_df, df1)
}
希望更新后的代码对您有所帮助。编码愉快