从宽到长 data.frame 合并两对 (key/value) 列
Wide to long data.frame merging two pairs (key/value) of columns
我有这个data.frame
set.seed(28100)
label_1 <- sample(c('first_col','second_col'), 10, replace = T)
dat <- data.frame(label_1,
value_1 = sample(1:100, 10, replace = T),
label_2 = sapply(label_1, FUN = function(x) ifelse(x == 'first_col', 'second_col', 'first_col')),
value_2 = sample(1:100, 10, replace = T))
head(dat)
label_1 value_1 label_2 value_2
1 first_col 88 second_col 84
2 first_col 40 second_col 30
3 first_col 98 second_col 32
4 second_col 80 first_col 64
5 first_col 34 second_col 43
6 second_col 52 first_col 10
两对 key/value 列的顺序不一致。我想将相同的数据重新整形为长格式data.frame,例如:
desired_dat <- data.frame(first_col = rep(NA, 10),
second_col = rep(NA, 10))
是否建议使用 reshape2
或 tidyr
来解决这个问题?具体如何?
这是一个可能的解决方案;但不是最优雅的。
myFun <- function(label1, value1, label2, value2, which_label) {
return(ifelse(label1 == which_label, value1, value2))
}
desired_dat <-
data.frame(first_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'first_col'), SIMPLIFY = TRUE),
second_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'second_col'), SIMPLIFY = TRUE))
head(desired_dat)
first_col second_col
1 88 84
2 40 30
3 98 32
4 64 80
5 34 43
6 10 52
我会使用 data.table
来做到这一点,尽管同样的原则也可以应用于 tidyverse
library(data.table)
## Setting as a data.table, and adding an 'id' value to keep track of rows
setDT(dat)
dat[, id := .I]
## then 'rbinding' the _1 and _2 columns together, with common column names
dat2 <- rbindlist(
list(
dat[, .(id, label = label_1, value = value_1)],
dat[, .(id, label = label_2, value = value_2)]
)
)
## the reshaping from long to wide to give you your desired result
dcast(dat2, formula = id ~ label)
# id first_col second_col
# 1: 1 88 84
# 2: 2 40 30
# 3: 3 98 32
# 4: 4 64 80
# 5: 5 34 43
# 6: 6 10 52
# 7: 7 23 85
# 8: 8 65 86
# 9: 9 4 35
# 10: 10 83 8
只使用 dplyr
怎么样(不需要 tidyr
等)?
library(dplyr)
dat %>% transmute(first_col = if_else(label_1 == "first_col", value_1, value_2),
second_col = if_else(label_2 == "second_col", value_2, value_1))
#> first_col second_col
#> 1 88 84
#> 2 40 30
#> 3 98 32
#> 4 64 80
#> 5 34 43
#> 6 10 52
#> 7 23 85
#> 8 65 86
#> 9 4 35
#> 10 83 8
这基本上是@SymbolixAU 的解决方案,只是翻译成 dplyr
:
# Create an ID for each row: probably not necessary but useful to check
dat <- dat %>%
mutate(id = row_number())
dat_long <- bind_rows(
dat %>% select(id, label = label_1, value = value_1),
dat %>% select(id, label = label_2, value = value_2)
)
output <- dat_long %>%
spread(label, value)
从版本 v1.9.6(CRAN 2015 年 9 月 19 日)开始,data.table
可以同时 melt()
多个列。所以这是一个 data.table
表达式链:
library(data.table)
as.data.table(dat)[, rn := .I][
, melt(.SD, measure.vars = patterns("label", "value"))][
, dcast(.SD, rn ~ value1)][, -"rn"]
first_col second_col
1: 88 84
2: 40 30
3: 98 32
4: 64 80
5: 34 43
6: 10 52
7: 23 85
8: 65 86
9: 4 35
10: 83 8
我有这个data.frame
set.seed(28100)
label_1 <- sample(c('first_col','second_col'), 10, replace = T)
dat <- data.frame(label_1,
value_1 = sample(1:100, 10, replace = T),
label_2 = sapply(label_1, FUN = function(x) ifelse(x == 'first_col', 'second_col', 'first_col')),
value_2 = sample(1:100, 10, replace = T))
head(dat)
label_1 value_1 label_2 value_2
1 first_col 88 second_col 84
2 first_col 40 second_col 30
3 first_col 98 second_col 32
4 second_col 80 first_col 64
5 first_col 34 second_col 43
6 second_col 52 first_col 10
两对 key/value 列的顺序不一致。我想将相同的数据重新整形为长格式data.frame,例如:
desired_dat <- data.frame(first_col = rep(NA, 10),
second_col = rep(NA, 10))
是否建议使用 reshape2
或 tidyr
来解决这个问题?具体如何?
这是一个可能的解决方案;但不是最优雅的。
myFun <- function(label1, value1, label2, value2, which_label) {
return(ifelse(label1 == which_label, value1, value2))
}
desired_dat <-
data.frame(first_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'first_col'), SIMPLIFY = TRUE),
second_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'second_col'), SIMPLIFY = TRUE))
head(desired_dat)
first_col second_col
1 88 84
2 40 30
3 98 32
4 64 80
5 34 43
6 10 52
我会使用 data.table
来做到这一点,尽管同样的原则也可以应用于 tidyverse
library(data.table)
## Setting as a data.table, and adding an 'id' value to keep track of rows
setDT(dat)
dat[, id := .I]
## then 'rbinding' the _1 and _2 columns together, with common column names
dat2 <- rbindlist(
list(
dat[, .(id, label = label_1, value = value_1)],
dat[, .(id, label = label_2, value = value_2)]
)
)
## the reshaping from long to wide to give you your desired result
dcast(dat2, formula = id ~ label)
# id first_col second_col
# 1: 1 88 84
# 2: 2 40 30
# 3: 3 98 32
# 4: 4 64 80
# 5: 5 34 43
# 6: 6 10 52
# 7: 7 23 85
# 8: 8 65 86
# 9: 9 4 35
# 10: 10 83 8
只使用 dplyr
怎么样(不需要 tidyr
等)?
library(dplyr)
dat %>% transmute(first_col = if_else(label_1 == "first_col", value_1, value_2),
second_col = if_else(label_2 == "second_col", value_2, value_1))
#> first_col second_col
#> 1 88 84
#> 2 40 30
#> 3 98 32
#> 4 64 80
#> 5 34 43
#> 6 10 52
#> 7 23 85
#> 8 65 86
#> 9 4 35
#> 10 83 8
这基本上是@SymbolixAU 的解决方案,只是翻译成 dplyr
:
# Create an ID for each row: probably not necessary but useful to check
dat <- dat %>%
mutate(id = row_number())
dat_long <- bind_rows(
dat %>% select(id, label = label_1, value = value_1),
dat %>% select(id, label = label_2, value = value_2)
)
output <- dat_long %>%
spread(label, value)
从版本 v1.9.6(CRAN 2015 年 9 月 19 日)开始,data.table
可以同时 melt()
多个列。所以这是一个 data.table
表达式链:
library(data.table)
as.data.table(dat)[, rn := .I][
, melt(.SD, measure.vars = patterns("label", "value"))][
, dcast(.SD, rn ~ value1)][, -"rn"]
first_col second_col 1: 88 84 2: 40 30 3: 98 32 4: 64 80 5: 34 43 6: 10 52 7: 23 85 8: 65 86 9: 4 35 10: 83 8