如何在给定日期之前投射具有缺失值的 data.table

How to dcast a data.table with missing values before given date

 DT <- data.table(Id = c(1, 1, 1, 1, 10, 100, 100, 101, 101, 101), 
                 Date = as.Date(c("1997-01-01", "1997-01-02", "1997-01-03", "1997-01-04", 
                                  "1997-01-02", "1997-01-02", "1997-01-04", "1997-01-03", 
                                  "1997-01-04", "1997-01-04")), 
                 Price = c(29, 25, 14, 26, 30, 16, 13, 62, 12,  6), 
                 IsFirst = c(T,F,F,F,T,T,F,T,F,F))

id:客户id;日期:交易日期;价格:交易金额; isFirst: TRUE 是交易是客户第一。每个客户有 1 个 IsFirst == TRUE,和 [0,Inf) IsFirst == FALSE。

>DT
     Id       Date Price IsFirst
 1:   1 1997-01-01    29    TRUE
 2:   1 1997-01-02    25   FALSE
 3:   1 1997-01-03    14   FALSE
 4:   1 1997-01-04    26   FALSE
 5:  10 1997-01-02    30    TRUE
 6: 100 1997-01-02    16    TRUE
 7: 100 1997-01-04    13   FALSE
 8: 101 1997-01-03    62    TRUE
 9: 101 1997-01-04    12   FALSE
10: 101 1997-01-04     6   FALSE

我需要把它转换成

    Id 1997-01-01 1997-01-02 1997-01-03 1997-01-04
1:   1         29         25         14         25
2:  10         NA         30          0          0
3: 100         NA         16          0         13
4: 101         NA         NA         62         18

NA 值应仅在客户进行 his/her 首次交易之前出现。在第一笔交易后,缺失值应该用0填充。我试过:

dcast.data.table(DT, Id ~ Date, fun = sum, value.var = "Price", fill = NA)

但这不起作用,因为它用 NA 填充了所有缺失的字段。目前我正在使用循环遍历所有 ID,手动将客户 IsFirst 之前的字段设置为 NA:

DT2 <- dcast.data.table(DT, Id ~ Date, fun = sum, value.var = "Price")

Ids <- unique(DT$Id)
for(id in Ids){
  if(DT[(Id == id & IsFirst == T),]$Date > as.Date(names(DT2)[2])){  
    DT2[Id == id, 2:(which(names(DT2)==as.character(DT[(Id == id & IsFirst == T),]$Date))-1) := NA, with = F]
  }
}

当我的数据变大时,这会非常慢。最快最有效的方法是什么?

熔铸

dDT <- dcast(DT, Id~Date, sum, value.var="Price")
setDT(dDT) # if not using data.table 1.9.5+

mDT <- melt(dDT,id.vars = c("Id"), variable.name="Date", value.name="Price")
mDT[, `:=`(idi = 1:.N, first_sale = which.max(!!Price)), by=Id]
mDT[ idi < first_sale, Price := NA_real_ ]

res <- dcast(mDT, Id~Date, sum, value.var="Price")

合并演员

setkey(DT,Id,Date)
mergeDT <- DT[, .(Price=sum(Price)), by=key(DT)][CJ(unique(Id),unique(Date))]
mergeDT[, ok := cumsum(!is.na(Price)) > 0, by=Id]
mergeDT[ok & is.na(Price), Price := 0]

res2 <- dcast(mergeDT, Id~Date, value.var="Price")

我发现这种方式更直观,在投射之前以长格式完成所有操作。