将具有两个不同后缀的数据框列堆叠成两列，最好使用 tidyverse

Question

假设我有一个数据帧列表，mylist并且我想对每个数据帧执行相同的操作。

说我的数据框是这样的：

set.seed(1)
test.tbl <- tibble(
  case1_diff = rnorm(10,0),
  case1_avg = rnorm(10,0),
  case2_diff = rnorm(10,0),
  case2_avg = rnorm(10,0),
  case3_diff = rnorm(10,0),
  case3_avg = rnorm(10,0),
  case4_diff = rnorm(10,0),
  case4_avg = rnorm(10,0),
)
> head(test.tbl)
# A tibble: 6 x 8
  case1_diff case1_avg case2_diff case2_avg case3_diff case3_avg case4_diff case4_avg
       <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>
1     -0.626    1.51       0.919     1.36       -0.165     0.398     2.40       0.476
2      0.184    0.390      0.782    -0.103      -0.253    -0.612    -0.0392    -0.710
3     -0.836   -0.621      0.0746    0.388       0.697     0.341     0.690      0.611
4      1.60    -2.21      -1.99     -0.0538      0.557    -1.13      0.0280    -0.934
5      0.330    1.12       0.620    -1.38       -0.689     1.43     -0.743     -1.25 
6     -0.820   -0.0449    -0.0561   -0.415      -0.707     1.98      0.189      0.291

我希望将它们堆叠成 diff 和 avg 两列作为 40 x 2 数据帧。

通常，我会通过select(ends_with("diff"))和select(ends_with("avg"))将它分成两个对象，旋转它们，然后bind_rows。

但是，由于我的原始对象是列表，所以我想使用 map 来完成它，例如：

mylist %>%
   map(*insertfunction1*) %>%
   map(*insertfunction2*)

意思是我需要在不分开的情况下做这件事。我还需要确保 diff 和 avg 正确配对。

到目前为止我尝试过的是

test.tbl %>%
  pivot_longer(cols=everything(),
               names_to = "metric") %>%
  mutate(metric = str_remove(metric,"[0-9]+")) %>%
  pivot_wider(id_cols=metric,
              values_from=value)

Answer 1

我们不需要 pivot_longer 和 pivot_wider。可以通过指定 names_to 和 names_sep 参数

在 pivot_longer 内部完成

library(dplyr)
library(tidyr)
test.tbl %>% 
     pivot_longer(cols = everything(), names_to = c('grp', '.value'),
            names_sep = "_") %>%
     select(-grp)

-输出

# A tibble: 40 x 2
#      diff    avg
#     <dbl>  <dbl>
# 1 -0.626   1.51 
# 2  0.919   1.36 
# 3 -0.165   0.398
# 4  2.40    0.476
# 5  0.184   0.390
# 6  0.782  -0.103
# 7 -0.253  -0.612
# 8 -0.0392 -0.710
# 9 -0.836  -0.621
#10  0.0746  0.388
# … with 30 more rows

将具有两个不同后缀的数据框列堆叠成两列，最好使用 tidyverse

Stack dataframe columns with two distinct suffix into two columns, preferably using tidyverse

r

data-manipulation

dplyr