如何在 data.table 的列表上应用聚合和 rbind?

How to apply aggregation and rbind on list of data.table?

我尝试将此 reprex 应用于许多 data.table 的列表,并根据许多条件进行聚合。 我尝试了 lapply、mapply、for... 的一些组合,但没有成功。

我的输入数据是 data.table 的列表:

nb.row <- 50
nb.col <- 5
lst.DT <- replicate(5, as.data.table(matrix(runif(n=nb.row*nb.col, min = 0, max = 100), nb.row, nb.col)), simplify = FALSE)
crit <- as.data.table(replicate(3,sample(1:5,nb.row, replace = TRUE)))
names(crit) <- c("C1", "C2", "C3")
lst.DT <- lapply(lst.DT, cbind, crit)

我尝试总结、简化的代码:

dt1.1 <- lst.DT[[1]][, .(new = sum(V4 / V5)), by = C1]
dt1.2 <- lst.DT[[1]][, .(new = sum(V4 / V5)), by = C2]
dt1.3 <- lst.DT[[1]][, .(new = sum(V4 / V5)), by = C3]

dt2.1 <- lst.DT[[2]][, .(new = sum(V4 / V5)), by = C1]
dt2.2 <- lst.DT[[2]][, .(new = sum(V4 / V5)), by = C2]
dt2.3 <- lst.DT[[2]][, .(new = sum(V4 / V5)), by = C3]

...

dtX.1 <- lst.DT[[X]][, .(new = sum(V4 / V5)), by = C1]
dtX.2 <- lst.DT[[X]][, .(new = sum(V4 / V5)), by = C2]
dtX.3 <- lst.DT[[X]][, .(new = sum(V4 / V5)), by = C3]

res1 <- rbindlist(list(dt1.1, dt1.2, dt1.3))
res2 <- rbindlist(list(dt2.1, dt2.2, dt2.3))
...
resX <- rbindlist(list(dtX.1, dtX.2, dtX.3))

最终在 return 中得到一个与 lst.DT 具有相同维度的列表,其中包含 res1、res2、...

如何执行这种操作? 非常感谢。

一个例子:

res1a <- rbindlist(
  lapply(
    paste0('C', 1:3),
    function(Ci) lst.DT[[1]][, .(new = sum(V4 / V5)), by = Ci]
  ), 
  use.names = FALSE
)

另一个使用 groupingsets():

vars <-  paste0('C', 1:3)
res1b <- groupingsets(
  lst.DT[[1]], j = sum(V4 / V5), by = vars, sets = as.list(vars)
)[, .(C1 = fcoalesce(.SD), new = V1), .SDcols = vars]

这里我对这个问题给出另一种分析观点。一句话,我关注的是不同列名相关的list结构C1,C2,C3然后rbind所有lists.

library(data.table)
sumby <- function(list_in,col_name){
    lapply(list_in, function(x) x[,.(new = sum(V4/V5)), by = col_name])
}

lt1 <- sumby(lst.DT,"C1")
lt2 <- sumby(lst.DT,"C2")
lt3 <- sumby(lst.DT,"C3")

# unify df's name in list then rbind all list
lt2 <- lapply(lt2, function(x) x[,.(C1=C2,new)])
lt3 <- lapply(lt3, function(x) x[,.(C1=C3,new)])
resu1 <- mapply(rbind,lt1,lt2,lt3, SIMPLIFY=FALSE)