R - 如何避免重复过滤和行绑定
R - how to avoid repeating filter & row bind
因为我正在处理一个非常大的数据集,所以我需要按组对我的数据集进行切片以便进行我的计算。
我有一个人周期 (melt
) 数据集,看起来像这样
group id var time
1 A 1 a 1
2 A 1 b 2
3 A 1 a 3
4 A 2 b 1
5 A 2 b 2
6 A 2 b 3
7 B 1 a 1
8 B 1 a 2
9 B 1 a 3
10 B 2 c 1
11 B 2 c 2
12 B 2 c 3
我需要做这个简单的转换
library(reshape2)
library(dplyr)
dt %>% dcast(group + id ~ time, value.var = 'var')
为了得到
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c
到目前为止,还不错。
但是因为我的数据库太大,所以需要针对每个不同的组分别做这个,比如
a = dt %>% filter(group == 'A') %>% dcast(group + id ~ time, value.var ='var')
b = dt %>% filter(group == 'B') %>% dcast(group + id ~ time, value.var = 'var')
bind_rows(a,b)
我的问题是我想避免手动。我的意思是,必须单独存储每个组,a = ..., b = ..., c = ..., and so on
知道如何使用单个 pipe
流来分隔每个组、计算转换并将其放回到数据框中吗?
dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), time = structure(c(1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor")), .Names = c("group", "id",
"var", "time"), row.names = c(NA, -12L), class = "data.frame")
lapply
是你的朋友:
do.call(rbind, lapply(unique(dt$Group), function(grp, dt){
dt %>% filter(Group == grp) %>% dcast(group + id ~ time, value.var = "var")
}, dt = dt))
Package purrr 可用于处理列表。首先按组拆分数据集,然后使用 map_df
到 dcast
每个列表,但 return 所有内容都在一个 data.frame.
中
library(purrr)
dt %>%
split(.$group) %>%
map_df(~dcast(.x, group + id ~ time, value.var = "var"))
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c
因为我正在处理一个非常大的数据集,所以我需要按组对我的数据集进行切片以便进行我的计算。
我有一个人周期 (melt
) 数据集,看起来像这样
group id var time
1 A 1 a 1
2 A 1 b 2
3 A 1 a 3
4 A 2 b 1
5 A 2 b 2
6 A 2 b 3
7 B 1 a 1
8 B 1 a 2
9 B 1 a 3
10 B 2 c 1
11 B 2 c 2
12 B 2 c 3
我需要做这个简单的转换
library(reshape2)
library(dplyr)
dt %>% dcast(group + id ~ time, value.var = 'var')
为了得到
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c
到目前为止,还不错。
但是因为我的数据库太大,所以需要针对每个不同的组分别做这个,比如
a = dt %>% filter(group == 'A') %>% dcast(group + id ~ time, value.var ='var')
b = dt %>% filter(group == 'B') %>% dcast(group + id ~ time, value.var = 'var')
bind_rows(a,b)
我的问题是我想避免手动。我的意思是,必须单独存储每个组,a = ..., b = ..., c = ..., and so on
知道如何使用单个 pipe
流来分隔每个组、计算转换并将其放回到数据框中吗?
dt = structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), var = structure(c(1L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), time = structure(c(1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor")), .Names = c("group", "id",
"var", "time"), row.names = c(NA, -12L), class = "data.frame")
lapply
是你的朋友:
do.call(rbind, lapply(unique(dt$Group), function(grp, dt){
dt %>% filter(Group == grp) %>% dcast(group + id ~ time, value.var = "var")
}, dt = dt))
Package purrr 可用于处理列表。首先按组拆分数据集,然后使用 map_df
到 dcast
每个列表,但 return 所有内容都在一个 data.frame.
library(purrr)
dt %>%
split(.$group) %>%
map_df(~dcast(.x, group + id ~ time, value.var = "var"))
group id 1 2 3
1 A 1 a b a
2 A 2 b b b
3 B 1 a a a
4 B 2 c c c