如何根据存储为列表的行名计算 cumsum？

Question

我有一个大数据框，第一列是字母数字行名称。我使用 idx 如下所示为每列（此处为 3 行）随机选择行。我现在需要计算每个 idx[i,j] 的累计和。我的数据框很大，因此出于计算时间的考虑，首选 plyr 包中的函数。知道我应该如何计算吗？

library(plyr)

V1 <- c('t14','t23','t54', 't13', 't1','t102', 't104', 't245')
V2 <- c(4.2, 5.3, 5.4,6, 7,8.5,9, 10.1)
V3 <- c(5.1, 5.1, 2.4,6.1, 7.7,5.5,1.99, 5.42)
my_df <- data.frame(V1, V2, V3)

 #The following line  randomly select 3 rows for each column
idx <- lapply(integer(ncol(my_df)-1), function(...) sample(my_df$V1, 3))

谢谢

Answer 1

希望其他人可以提出 plyr 解决方案（我对这个包没有太多经验）。与此同时，这里有一个 data.table 解决方案，它可能与 plyr:

一样快（也许更快）

library(plyr)

V1 <- c('t14','t23','t54', 't13', 't1','t102', 't104', 't245')
V2 <- c(4.2, 5.3, 5.4,6, 7,8.5,9, 10.1)
V3 <- c(5.1, 5.1, 2.4,6.1, 7.7,5.5,1.99, 5.42)
my_df <- data.frame(V1, V2, V3, stringsAsFactors = F)

#The following line  randomly select 3 rows for each column
set.seed(100) # Setting seed so that this example is reproducible
idx <- lapply(integer(ncol(my_df)-1), function(...) sample(my_df$V1, 3))

idx

# Additional code

# Import the data.table package - you'd want to move this line to the top of your code
library(data.table) 
setDT(my_df) # Cast the data.frame to data.table
setkey(my_df, V1) # Set the key for the data.table to V1

# With the key set as V1, I can just call idx[[i]] as the first argument of my_df 
# This will map each value of idx[[i]] to the appropriate row based on V1
# In the following, for the i-th vector in idx, I calculate the cumulative sum of each of V_{i + 1}
myResult = lapply(1:length(idx), function(i){
         my_df[idx[[i]], lapply(.SD, cumsum), .SDcols = i + 1]
    }
)

此时，myResult 是一个列表：

[[1]]
     V2
1:  5.4
2: 10.7
3: 16.7

[[2]]
     V3
1:  5.1
2: 11.2
3: 13.6

我们创建一个数据框如下：

# Column bind to create matrix of results
myResult = do.call(cbind, myResult)

结果如下：

     V2   V3
1:  5.4  5.1
2: 10.7 11.2
3: 16.7 13.6

如何根据存储为列表的行名计算 cumsum？

How to calculate cumsum based on row names stored as a list?

r

plyr

dataframe