在 data.table 中按组和按日期连续累积列中的条目

Question

我的数据如下（包：data.table）：

  DT <- data.table(Id = c(1,1,1,1,2,2,2,2,1,1), Time = c(0,0,0,0,0,0,0,0,1,1), 
             Date = as.Date(c("20000101", "20000102", "20000103", "20000104", "20000101",
                              "20000102","20000103","20000103", "20000201", "20000201"), "%Y%m%d"),
             Price = c(0,1,0,3,2,0,4,5,2,3))
 >DT
    Id Time       Date Price
 1:  1    0 2000-01-01     0
 2:  1    0 2000-01-02     1
 3:  1    0 2000-01-03     0
 4:  1    0 2000-01-04     3
 5:  2    0 2000-01-01     2
 6:  2    0 2000-01-02     0
 7:  2    0 2000-01-03     4
 8:  2    0 2000-01-03     5
 9:  1    1 2000-02-01     2
10:  1    1 2000-02-01     3

Price需要由Time和Id累加，并按Date的顺序累加，这样输出结果如下：

    Id Time       Date Price Cum.price
 1:  1    0 2000-01-01     0         0
 2:  1    0 2000-01-02     1         1
 3:  1    0 2000-01-03     0         1
 4:  1    0 2000-01-04     3         4
 5:  2    0 2000-01-01     2         2
 6:  2    0 2000-01-02     0         2
 7:  2    0 2000-01-03     4         6
 8:  2    0 2000-01-03     5        11
 9:  1    1 2000-02-01     2         2
10:  1    1 2000-02-01     3         5

更多信息：data.table 填充为每天每 Id 每 Time 包含 1 个条目。 Price.

中没有缺失值

我可以想出很多方法来使用循环来解决这个问题，但是有没有一种非常有效的方法可以使用 data.table 来解决这个问题，并且可以快速处理大型 data.tables？

Answer 1

您可以按 'Time' 和 'Id' 列分组，并获得 'Price' 列的 cumsum order 由 'Date' 列

DT[order(Date), Cum.price :=cumsum(Price), by = .(Time, Id)]
DT
#     Id Time       Date Price Cum.price
# 1:  1    0 2000-01-01     0         0
# 2:  1    0 2000-01-02     1         1
# 3:  1    0 2000-01-03     0         1
# 4:  1    0 2000-01-04     3         4
# 5:  2    0 2000-01-01     2         2
# 6:  2    0 2000-01-02     0         2
# 7:  2    0 2000-01-03     4         6
# 8:  2    0 2000-01-03     5        11
# 9:  1    1 2000-02-01     2         2
#10:  1    1 2000-02-01     3         5

在 data.table 中按组和按日期连续累积列中的条目

Cumulating the entries in a column by group and consecutively by date, in a data.table

optimization

r

data.table