如何根据存储为列表的行名计算 cumsum?
How to calculate cumsum based on row names stored as a list?
我有一个大数据框,第一列是字母数字行名称。我使用 idx 如下所示为每列(此处为 3 行)随机选择行。我现在需要计算每个 idx[i,j] 的累计和。我的数据框很大,因此出于计算时间的考虑,首选 plyr 包中的函数。知道我应该如何计算吗?
library(plyr)
V1 <- c('t14','t23','t54', 't13', 't1','t102', 't104', 't245')
V2 <- c(4.2, 5.3, 5.4,6, 7,8.5,9, 10.1)
V3 <- c(5.1, 5.1, 2.4,6.1, 7.7,5.5,1.99, 5.42)
my_df <- data.frame(V1, V2, V3)
#The following line randomly select 3 rows for each column
idx <- lapply(integer(ncol(my_df)-1), function(...) sample(my_df$V1, 3))
谢谢
希望其他人可以提出 plyr
解决方案(我对这个包没有太多经验)。与此同时,这里有一个 data.table
解决方案,它可能与 plyr
:
一样快(也许更快)
library(plyr)
V1 <- c('t14','t23','t54', 't13', 't1','t102', 't104', 't245')
V2 <- c(4.2, 5.3, 5.4,6, 7,8.5,9, 10.1)
V3 <- c(5.1, 5.1, 2.4,6.1, 7.7,5.5,1.99, 5.42)
my_df <- data.frame(V1, V2, V3, stringsAsFactors = F)
#The following line randomly select 3 rows for each column
set.seed(100) # Setting seed so that this example is reproducible
idx <- lapply(integer(ncol(my_df)-1), function(...) sample(my_df$V1, 3))
idx
# Additional code
# Import the data.table package - you'd want to move this line to the top of your code
library(data.table)
setDT(my_df) # Cast the data.frame to data.table
setkey(my_df, V1) # Set the key for the data.table to V1
# With the key set as V1, I can just call idx[[i]] as the first argument of my_df
# This will map each value of idx[[i]] to the appropriate row based on V1
# In the following, for the i-th vector in idx, I calculate the cumulative sum of each of V_{i + 1}
myResult = lapply(1:length(idx), function(i){
my_df[idx[[i]], lapply(.SD, cumsum), .SDcols = i + 1]
}
)
此时,myResult 是一个列表:
[[1]]
V2
1: 5.4
2: 10.7
3: 16.7
[[2]]
V3
1: 5.1
2: 11.2
3: 13.6
我们创建一个数据框如下:
# Column bind to create matrix of results
myResult = do.call(cbind, myResult)
结果如下:
V2 V3
1: 5.4 5.1
2: 10.7 11.2
3: 16.7 13.6
我有一个大数据框,第一列是字母数字行名称。我使用 idx 如下所示为每列(此处为 3 行)随机选择行。我现在需要计算每个 idx[i,j] 的累计和。我的数据框很大,因此出于计算时间的考虑,首选 plyr 包中的函数。知道我应该如何计算吗?
library(plyr)
V1 <- c('t14','t23','t54', 't13', 't1','t102', 't104', 't245')
V2 <- c(4.2, 5.3, 5.4,6, 7,8.5,9, 10.1)
V3 <- c(5.1, 5.1, 2.4,6.1, 7.7,5.5,1.99, 5.42)
my_df <- data.frame(V1, V2, V3)
#The following line randomly select 3 rows for each column
idx <- lapply(integer(ncol(my_df)-1), function(...) sample(my_df$V1, 3))
谢谢
希望其他人可以提出 plyr
解决方案(我对这个包没有太多经验)。与此同时,这里有一个 data.table
解决方案,它可能与 plyr
:
library(plyr)
V1 <- c('t14','t23','t54', 't13', 't1','t102', 't104', 't245')
V2 <- c(4.2, 5.3, 5.4,6, 7,8.5,9, 10.1)
V3 <- c(5.1, 5.1, 2.4,6.1, 7.7,5.5,1.99, 5.42)
my_df <- data.frame(V1, V2, V3, stringsAsFactors = F)
#The following line randomly select 3 rows for each column
set.seed(100) # Setting seed so that this example is reproducible
idx <- lapply(integer(ncol(my_df)-1), function(...) sample(my_df$V1, 3))
idx
# Additional code
# Import the data.table package - you'd want to move this line to the top of your code
library(data.table)
setDT(my_df) # Cast the data.frame to data.table
setkey(my_df, V1) # Set the key for the data.table to V1
# With the key set as V1, I can just call idx[[i]] as the first argument of my_df
# This will map each value of idx[[i]] to the appropriate row based on V1
# In the following, for the i-th vector in idx, I calculate the cumulative sum of each of V_{i + 1}
myResult = lapply(1:length(idx), function(i){
my_df[idx[[i]], lapply(.SD, cumsum), .SDcols = i + 1]
}
)
此时,myResult 是一个列表:
[[1]]
V2
1: 5.4
2: 10.7
3: 16.7
[[2]]
V3
1: 5.1
2: 11.2
3: 13.6
我们创建一个数据框如下:
# Column bind to create matrix of results
myResult = do.call(cbind, myResult)
结果如下:
V2 V3
1: 5.4 5.1
2: 10.7 11.2
3: 16.7 13.6