dplyr|tidyverse:将键值对集合合并为单个键值(长格式)
dplyr|tidyverse: merge sets of key-value pairs into single key-value (long format)
合并两组键值对数据的规范dplyr
或tidyverse
方法是什么?
第一个键值对是parameter
-coeft
.
第二个键值对是param
-value
。问题是这些值是重复的。
我想将它们合并成一个键值对。
dat <- tidyr::crossing(sim=c(1:5),
parameter=c('mu','sigma'),
param=c('sd','sd')
) %>%
dplyr::mutate(coeft=rnorm(n=10)) %>%
dplyr::mutate(value=sort(rep(rnorm(n=5),2)))
> dat
# A tibble: 10 x 5
sim parameter param coeft value
<int> <chr> <chr> <dbl> <dbl>
1 1 mu sd -1.91 -0.601
2 1 sigma sd -0.967 -0.601
3 2 mu sd -1.95 0.0645
4 2 sigma sd 0.676 0.0645
5 3 mu sd -0.891 0.673
6 3 sigma sd -0.328 0.673
7 4 mu sd -2.30 1.08
8 4 sigma sd 0.679 1.08
9 5 mu sd -0.598 1.99
10 5 sigma sd -0.339 1.99
所需结构:
# A tibble: 15 x 3
sim parameter coeft
<int> <chr> <dbl>
1 1 mu -1.91
2 1 sigma -0.967
3 1 sd -0.601
4 2 mu -1.95
5 2 sigma 0.676
6 2 sd 0.0645
...
如果我们需要重塑成'long'格式的多组列,那么data.table
中的melt
是一个选项
library(data.table)
dt <- unique(melt(setDT(dat), measure = list(2:3, 4:5),
value.name = c('parameter', 'coeft')))[, variable := NULL][order(sim)]
dt
# sim parameter coeft
# 1: 1 mu -1.9100
# 2: 1 sigma -0.9670
# 3: 1 sd -0.6010
# 4: 2 mu -1.9500
# 5: 2 sigma 0.6760
# 6: 2 sd 0.0645
# 7: 3 mu -0.8910
# 8: 3 sigma -0.3280
# 9: 3 sd 0.6730
#10: 4 mu -2.3000
#11: 4 sigma 0.6790
#12: 4 sd 1.0800
#13: 5 mu -0.5980
#14: 5 sigma -0.3390
#15: 5 sd 1.9900
数据
dat <- structure(list(sim = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), parameter = c("mu", "sigma", "mu", "sigma", "mu", "sigma",
"mu", "sigma", "mu", "sigma"), param = c("sd", "sd", "sd", "sd",
"sd", "sd", "sd", "sd", "sd", "sd"), coeft = c(-1.91, -0.967,
-1.95, 0.676, -0.891, -0.328, -2.3, 0.679, -0.598, -0.339), value = c(-0.601,
-0.601, 0.0645, 0.0645, 0.673, 0.673, 1.08, 1.08, 1.99, 1.99)),
.Names = c("sim",
"parameter", "param", "coeft", "value"),
class = "data.frame", row.names = c(NA,
-10L))
这是一种使用 dplyr
的方法(运行s 使用 dplyr
v0.7.4,Windows 7,R64 位):
dat %>%
spread(parameter, coeft) %>% #convert to wide format
rename(sd = value) %>% #change the name of a column
gather(parameter, coeft, c(4,5,3)) %>% #convert three disjointly located columns to long format, note the order of columns
# gather(parameter, coeft, sd:sigma) %>% #convert three contiguously located columns to long format
arrange(sim) %>% #order of rows
select(-param)
这会在某些版本的 dplyr (0.7.4) 上发出警告,但不会在所有版本上发出警告(post 明天会有一个没有错误的版本 - 当我检查时)。
warning:
Warning message:
In if (!is.finite(x)) return(FALSE) :
the condition has length > 1 and only the first element will be used
在这种情况下,这可以 运行 没有警告:
dat %>%
spread(parameter, coeft) %>%
dplyr::rename(sd = value) %>%
gather(parameter, coeft, "mu", "sigma", "sd") %>%
arrange(sim) %>% #order of rows
select(-param)
另请注意,如果您希望使用列排除符号,则需要提前排除 param
列。
dat %>%
spread(parameter, coeft) %>% #convert to wide format
rename(sd = value) %>% #change the name of a column
select(-param) %>%
gather(parameter, coeft, -sim) %>% #convert three contiguously located columns to long format
arrange(sim) #order of rows
#output
sim parameter coeft
<int> <chr> <dbl>
1 1 mu -0.626
2 1 sigma 0.184
3 1 sd -2.21
4 2 mu -0.836
5 2 sigma 1.60
6 2 sd -0.621
7 3 mu 0.330
8 3 sigma -0.820
9 3 sd 0.390
10 4 mu 0.487
11 4 sigma 0.738
12 4 sd 1.12
13 5 mu 0.576
14 5 sigma -0.305
15 5 sd 1.51
数据:
set.seed(1)
dat <- tidyr::crossing(sim=c(1:5),
parameter=c('mu','sigma'),
param=c('sd','sd')
) %>%
dplyr::mutate(coeft=rnorm(n=10)) %>%
dplyr::mutate(value=sort(rep(rnorm(n=5),2)))
合并两组键值对数据的规范dplyr
或tidyverse
方法是什么?
第一个键值对是parameter
-coeft
.
第二个键值对是param
-value
。问题是这些值是重复的。
我想将它们合并成一个键值对。
dat <- tidyr::crossing(sim=c(1:5),
parameter=c('mu','sigma'),
param=c('sd','sd')
) %>%
dplyr::mutate(coeft=rnorm(n=10)) %>%
dplyr::mutate(value=sort(rep(rnorm(n=5),2)))
> dat
# A tibble: 10 x 5
sim parameter param coeft value
<int> <chr> <chr> <dbl> <dbl>
1 1 mu sd -1.91 -0.601
2 1 sigma sd -0.967 -0.601
3 2 mu sd -1.95 0.0645
4 2 sigma sd 0.676 0.0645
5 3 mu sd -0.891 0.673
6 3 sigma sd -0.328 0.673
7 4 mu sd -2.30 1.08
8 4 sigma sd 0.679 1.08
9 5 mu sd -0.598 1.99
10 5 sigma sd -0.339 1.99
所需结构:
# A tibble: 15 x 3
sim parameter coeft
<int> <chr> <dbl>
1 1 mu -1.91
2 1 sigma -0.967
3 1 sd -0.601
4 2 mu -1.95
5 2 sigma 0.676
6 2 sd 0.0645
...
如果我们需要重塑成'long'格式的多组列,那么data.table
中的melt
是一个选项
library(data.table)
dt <- unique(melt(setDT(dat), measure = list(2:3, 4:5),
value.name = c('parameter', 'coeft')))[, variable := NULL][order(sim)]
dt
# sim parameter coeft
# 1: 1 mu -1.9100
# 2: 1 sigma -0.9670
# 3: 1 sd -0.6010
# 4: 2 mu -1.9500
# 5: 2 sigma 0.6760
# 6: 2 sd 0.0645
# 7: 3 mu -0.8910
# 8: 3 sigma -0.3280
# 9: 3 sd 0.6730
#10: 4 mu -2.3000
#11: 4 sigma 0.6790
#12: 4 sd 1.0800
#13: 5 mu -0.5980
#14: 5 sigma -0.3390
#15: 5 sd 1.9900
数据
dat <- structure(list(sim = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), parameter = c("mu", "sigma", "mu", "sigma", "mu", "sigma",
"mu", "sigma", "mu", "sigma"), param = c("sd", "sd", "sd", "sd",
"sd", "sd", "sd", "sd", "sd", "sd"), coeft = c(-1.91, -0.967,
-1.95, 0.676, -0.891, -0.328, -2.3, 0.679, -0.598, -0.339), value = c(-0.601,
-0.601, 0.0645, 0.0645, 0.673, 0.673, 1.08, 1.08, 1.99, 1.99)),
.Names = c("sim",
"parameter", "param", "coeft", "value"),
class = "data.frame", row.names = c(NA,
-10L))
这是一种使用 dplyr
的方法(运行s 使用 dplyr
v0.7.4,Windows 7,R64 位):
dat %>%
spread(parameter, coeft) %>% #convert to wide format
rename(sd = value) %>% #change the name of a column
gather(parameter, coeft, c(4,5,3)) %>% #convert three disjointly located columns to long format, note the order of columns
# gather(parameter, coeft, sd:sigma) %>% #convert three contiguously located columns to long format
arrange(sim) %>% #order of rows
select(-param)
这会在某些版本的 dplyr (0.7.4) 上发出警告,但不会在所有版本上发出警告(post 明天会有一个没有错误的版本 - 当我检查时)。
warning:
Warning message:
In if (!is.finite(x)) return(FALSE) :
the condition has length > 1 and only the first element will be used
在这种情况下,这可以 运行 没有警告:
dat %>%
spread(parameter, coeft) %>%
dplyr::rename(sd = value) %>%
gather(parameter, coeft, "mu", "sigma", "sd") %>%
arrange(sim) %>% #order of rows
select(-param)
另请注意,如果您希望使用列排除符号,则需要提前排除 param
列。
dat %>%
spread(parameter, coeft) %>% #convert to wide format
rename(sd = value) %>% #change the name of a column
select(-param) %>%
gather(parameter, coeft, -sim) %>% #convert three contiguously located columns to long format
arrange(sim) #order of rows
#output
sim parameter coeft
<int> <chr> <dbl>
1 1 mu -0.626
2 1 sigma 0.184
3 1 sd -2.21
4 2 mu -0.836
5 2 sigma 1.60
6 2 sd -0.621
7 3 mu 0.330
8 3 sigma -0.820
9 3 sd 0.390
10 4 mu 0.487
11 4 sigma 0.738
12 4 sd 1.12
13 5 mu 0.576
14 5 sigma -0.305
15 5 sd 1.51
数据:
set.seed(1)
dat <- tidyr::crossing(sim=c(1:5),
parameter=c('mu','sigma'),
param=c('sd','sd')
) %>%
dplyr::mutate(coeft=rnorm(n=10)) %>%
dplyr::mutate(value=sort(rep(rnorm(n=5),2)))