在 data.table 中按组和按日期连续累积列中的条目
Cumulating the entries in a column by group and consecutively by date, in a data.table
我的数据如下(包:data.table):
DT <- data.table(Id = c(1,1,1,1,2,2,2,2,1,1), Time = c(0,0,0,0,0,0,0,0,1,1),
Date = as.Date(c("20000101", "20000102", "20000103", "20000104", "20000101",
"20000102","20000103","20000103", "20000201", "20000201"), "%Y%m%d"),
Price = c(0,1,0,3,2,0,4,5,2,3))
>DT
Id Time Date Price
1: 1 0 2000-01-01 0
2: 1 0 2000-01-02 1
3: 1 0 2000-01-03 0
4: 1 0 2000-01-04 3
5: 2 0 2000-01-01 2
6: 2 0 2000-01-02 0
7: 2 0 2000-01-03 4
8: 2 0 2000-01-03 5
9: 1 1 2000-02-01 2
10: 1 1 2000-02-01 3
Price
需要由Time
和Id
累加,并按Date
的顺序累加,这样输出结果如下:
Id Time Date Price Cum.price
1: 1 0 2000-01-01 0 0
2: 1 0 2000-01-02 1 1
3: 1 0 2000-01-03 0 1
4: 1 0 2000-01-04 3 4
5: 2 0 2000-01-01 2 2
6: 2 0 2000-01-02 0 2
7: 2 0 2000-01-03 4 6
8: 2 0 2000-01-03 5 11
9: 1 1 2000-02-01 2 2
10: 1 1 2000-02-01 3 5
更多信息:data.table 填充为每天每 Id
每 Time
包含 1 个条目。 Price
.
中没有缺失值
我可以想出很多方法来使用循环来解决这个问题,但是有没有一种非常有效的方法可以使用 data.table 来解决这个问题,并且可以快速处理大型 data.tables?
您可以按 'Time' 和 'Id' 列分组,并获得 'Price' 列的 cumsum
order
由 'Date' 列
DT[order(Date), Cum.price :=cumsum(Price), by = .(Time, Id)]
DT
# Id Time Date Price Cum.price
# 1: 1 0 2000-01-01 0 0
# 2: 1 0 2000-01-02 1 1
# 3: 1 0 2000-01-03 0 1
# 4: 1 0 2000-01-04 3 4
# 5: 2 0 2000-01-01 2 2
# 6: 2 0 2000-01-02 0 2
# 7: 2 0 2000-01-03 4 6
# 8: 2 0 2000-01-03 5 11
# 9: 1 1 2000-02-01 2 2
#10: 1 1 2000-02-01 3 5
我的数据如下(包:data.table):
DT <- data.table(Id = c(1,1,1,1,2,2,2,2,1,1), Time = c(0,0,0,0,0,0,0,0,1,1),
Date = as.Date(c("20000101", "20000102", "20000103", "20000104", "20000101",
"20000102","20000103","20000103", "20000201", "20000201"), "%Y%m%d"),
Price = c(0,1,0,3,2,0,4,5,2,3))
>DT
Id Time Date Price
1: 1 0 2000-01-01 0
2: 1 0 2000-01-02 1
3: 1 0 2000-01-03 0
4: 1 0 2000-01-04 3
5: 2 0 2000-01-01 2
6: 2 0 2000-01-02 0
7: 2 0 2000-01-03 4
8: 2 0 2000-01-03 5
9: 1 1 2000-02-01 2
10: 1 1 2000-02-01 3
Price
需要由Time
和Id
累加,并按Date
的顺序累加,这样输出结果如下:
Id Time Date Price Cum.price
1: 1 0 2000-01-01 0 0
2: 1 0 2000-01-02 1 1
3: 1 0 2000-01-03 0 1
4: 1 0 2000-01-04 3 4
5: 2 0 2000-01-01 2 2
6: 2 0 2000-01-02 0 2
7: 2 0 2000-01-03 4 6
8: 2 0 2000-01-03 5 11
9: 1 1 2000-02-01 2 2
10: 1 1 2000-02-01 3 5
更多信息:data.table 填充为每天每 Id
每 Time
包含 1 个条目。 Price
.
我可以想出很多方法来使用循环来解决这个问题,但是有没有一种非常有效的方法可以使用 data.table 来解决这个问题,并且可以快速处理大型 data.tables?
您可以按 'Time' 和 'Id' 列分组,并获得 'Price' 列的 cumsum
order
由 'Date' 列
DT[order(Date), Cum.price :=cumsum(Price), by = .(Time, Id)]
DT
# Id Time Date Price Cum.price
# 1: 1 0 2000-01-01 0 0
# 2: 1 0 2000-01-02 1 1
# 3: 1 0 2000-01-03 0 1
# 4: 1 0 2000-01-04 3 4
# 5: 2 0 2000-01-01 2 2
# 6: 2 0 2000-01-02 0 2
# 7: 2 0 2000-01-03 4 6
# 8: 2 0 2000-01-03 5 11
# 9: 1 1 2000-02-01 2 2
#10: 1 1 2000-02-01 3 5