R - 如何避免重复过滤和行绑定

R - how to avoid repeating filter & row bind

因为我正在处理一个非常大的数据集,所以我需要按组对我的数据集进行切片以便进行我的计算。

我有一个人周期 (melt) 数据集,看起来像这样

    group id var time
1      A  1   a    1
2      A  1   b    2
3      A  1   a    3
4      A  2   b    1
5      A  2   b    2
6      A  2   b    3
7      B  1   a    1
8      B  1   a    2
9      B  1   a    3
10     B  2   c    1
11     B  2   c    2
12     B  2   c    3

我需要做这个简单的转换

library(reshape2) 
library(dplyr) 

dt %>% dcast(group + id ~ time, value.var = 'var')

为了得到

  group id 1 2 3
1     A  1 a b a
2     A  2 b b b
3     B  1 a a a
4     B  2 c c c

到目前为止,还不错。

但是因为我的数据库太大,所以需要针对每个不同的组分别做这个,比如

a = dt %>% filter(group == 'A') %>% dcast(group + id ~ time, value.var ='var')
b = dt %>% filter(group == 'B') %>% dcast(group + id ~ time, value.var = 'var')

bind_rows(a,b)

我的问题是我想避免手动。我的意思是,必须单独存储每个组,a = ..., b = ..., c = ..., and so on

知道如何使用单个 pipe 流来分隔每个组、计算转换并将其放回到数据框中吗?

dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L, 
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a", 
"b", "c"), class = "factor"), time = structure(c(1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1", 
"2", "3"), class = "factor")), .Names = c("group", "id", 
"var", "time"), row.names = c(NA, -12L), class = "data.frame")

lapply 是你的朋友:

do.call(rbind, lapply(unique(dt$Group), function(grp, dt){
  dt %>% filter(Group == grp) %>% dcast(group + id ~ time, value.var = "var")
}, dt = dt))

Package purrr 可用于处理列表。首先按组拆分数据集,然后使用 map_dfdcast 每个列表,但 return 所有内容都在一个 data.frame.

library(purrr)

dt %>%
    split(.$group) %>%
    map_df(~dcast(.x, group + id ~ time, value.var = "var"))

  group id 1 2 3
1     A  1 a b a
2     A  2 b b b
3     B  1 a a a
4     B  2 c c c