r - 在特定时间范围内按 id 计算滚动总和
r - compute rolling sum by id within specific time frame
我想通过 id 计算前 1 年 window 当前行之前的行数。
这是我的数据:
df <- structure(list(id = c("1", "1", "1", "1",
"2", "2", "2", "2", "2", "2", "2",
"2", "2"), flag = c(1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1), date = structure(c(15425, 15456, 16613,
16959, 15513, 15513, 15625, 15635, 15649, 15663, 15670, 16051,
16052), class = "Date")), sorted = "id", class = c("data.table",
"data.frame"), row.names = c(NA, -13L))
roll_sum <- c(0, 1, 0, 1, 0, 1, 2, 3, 4, 5, 6, 0, 1)
flag_sum <- c(0, 1, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1)
df_desired <- cbind(df, roll_sum) # roll_sum: number of rows excluding current row in 1 year time frame rolling
df_desired <- cbind(df_desired, flag_sum) # flag_sum: number of rows excluding current row in 1 year time frame rolling where flag was 1
数据:
id flag date
1: 1 1 2012-03-26
2: 1 1 2012-04-26
3: 1 0 2015-06-27
4: 1 1 2016-06-07
5: 2 0 2012-06-22
6: 2 0 2012-06-22
7: 2 1 2012-10-12
8: 2 1 2012-10-22
9: 2 1 2012-11-05
10: 2 1 2012-11-19
11: 2 1 2012-11-26
12: 2 1 2013-12-12
13: 2 1 2013-12-13
输出:
df_desired
id flag date roll_sum flag_sum
1: 1 1 2012-03-26 0 0
2: 1 1 2012-04-26 1 1
3: 1 0 2015-06-27 0 0
4: 1 1 2016-06-07 1 0
5: 2 0 2012-06-22 0 0
6: 2 0 2012-06-22 1 0
7: 2 1 2012-10-12 2 0
8: 2 1 2012-10-22 3 1
9: 2 1 2012-11-05 4 2
10: 2 1 2012-11-19 5 3
11: 2 1 2012-11-26 6 4
12: 2 1 2013-12-12 0 0
13: 2 1 2013-12-13 1 1
我尝试了 G. Grothendieck 在 Compute rolling sum by id variables, with missing timepoints 中使用 zoo
给出的解决方案,但它给了我一个错误:
Error in merge.zoo(z, g) :
series cannot be merged with non-unique index entries in a series
In addition: Warning message:
In zoo(count, date) :
我使用 make.index.unique
和 make.time.unique
使日期列唯一。
对优化解决方案的任何帮助都是 appreciated.Thanks。
不确定这对您的数据维度是否有帮助。
首先,创建 运行 索引来处理重复的日期,滚动总和不得包括上一个重复日期和一年前的创建日期(我认为 365 更好,但似乎 OP 想要 366)。
然后,执行非相等自连接,同时确保未使用上一个重复日期且日期在一年内。
df[, c("rn", "oneYrAgo") := .(.I, date - 366)]
df[df,
.(roll_sum=.N, flag_sum=sum(flag, na.rm=TRUE)),
on=.(date >= oneYrAgo, rn < rn, id, date <= date),
by=.EACHI][,
-seq_len(2L)]
结果:
id date roll_sum flag_sum
1: 1 2012-03-26 0 0
2: 1 2012-04-26 1 1
3: 1 2015-06-27 0 0
4: 1 2016-06-07 1 0
5: 2 2012-06-22 0 0
6: 2 2012-06-22 1 0
7: 2 2012-10-12 2 0
8: 2 2012-10-22 3 1
9: 2 2012-11-05 4 2
10: 2 2012-11-19 5 3
11: 2 2012-11-26 6 4
12: 2 2013-12-12 0 0
13: 2 2013-12-13 1 1
我想通过 id 计算前 1 年 window 当前行之前的行数。
这是我的数据:
df <- structure(list(id = c("1", "1", "1", "1",
"2", "2", "2", "2", "2", "2", "2",
"2", "2"), flag = c(1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1), date = structure(c(15425, 15456, 16613,
16959, 15513, 15513, 15625, 15635, 15649, 15663, 15670, 16051,
16052), class = "Date")), sorted = "id", class = c("data.table",
"data.frame"), row.names = c(NA, -13L))
roll_sum <- c(0, 1, 0, 1, 0, 1, 2, 3, 4, 5, 6, 0, 1)
flag_sum <- c(0, 1, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1)
df_desired <- cbind(df, roll_sum) # roll_sum: number of rows excluding current row in 1 year time frame rolling
df_desired <- cbind(df_desired, flag_sum) # flag_sum: number of rows excluding current row in 1 year time frame rolling where flag was 1
数据:
id flag date
1: 1 1 2012-03-26
2: 1 1 2012-04-26
3: 1 0 2015-06-27
4: 1 1 2016-06-07
5: 2 0 2012-06-22
6: 2 0 2012-06-22
7: 2 1 2012-10-12
8: 2 1 2012-10-22
9: 2 1 2012-11-05
10: 2 1 2012-11-19
11: 2 1 2012-11-26
12: 2 1 2013-12-12
13: 2 1 2013-12-13
输出:
df_desired
id flag date roll_sum flag_sum
1: 1 1 2012-03-26 0 0
2: 1 1 2012-04-26 1 1
3: 1 0 2015-06-27 0 0
4: 1 1 2016-06-07 1 0
5: 2 0 2012-06-22 0 0
6: 2 0 2012-06-22 1 0
7: 2 1 2012-10-12 2 0
8: 2 1 2012-10-22 3 1
9: 2 1 2012-11-05 4 2
10: 2 1 2012-11-19 5 3
11: 2 1 2012-11-26 6 4
12: 2 1 2013-12-12 0 0
13: 2 1 2013-12-13 1 1
我尝试了 G. Grothendieck 在 Compute rolling sum by id variables, with missing timepoints 中使用 zoo
给出的解决方案,但它给了我一个错误:
Error in merge.zoo(z, g) : series cannot be merged with non-unique index entries in a series In addition: Warning message: In zoo(count, date) :
我使用 make.index.unique
和 make.time.unique
使日期列唯一。
对优化解决方案的任何帮助都是 appreciated.Thanks。
不确定这对您的数据维度是否有帮助。
首先,创建 运行 索引来处理重复的日期,滚动总和不得包括上一个重复日期和一年前的创建日期(我认为 365 更好,但似乎 OP 想要 366)。
然后,执行非相等自连接,同时确保未使用上一个重复日期且日期在一年内。
df[, c("rn", "oneYrAgo") := .(.I, date - 366)]
df[df,
.(roll_sum=.N, flag_sum=sum(flag, na.rm=TRUE)),
on=.(date >= oneYrAgo, rn < rn, id, date <= date),
by=.EACHI][,
-seq_len(2L)]
结果:
id date roll_sum flag_sum
1: 1 2012-03-26 0 0
2: 1 2012-04-26 1 1
3: 1 2015-06-27 0 0
4: 1 2016-06-07 1 0
5: 2 2012-06-22 0 0
6: 2 2012-06-22 1 0
7: 2 2012-10-12 2 0
8: 2 2012-10-22 3 1
9: 2 2012-11-05 4 2
10: 2 2012-11-19 5 3
11: 2 2012-11-26 6 4
12: 2 2013-12-12 0 0
13: 2 2013-12-13 1 1