R从宽到长重塑:多个变量,具有多个索引的观察

R reshape wide to long: multiple variables, observations with multiple indicies

我得到了一些数据,其中包含以杂乱的宽格式存储的多个 idicies $y_{ibc}$ 的观察结果。我一直在摆弄 tidyr 和 reshape2 但无法弄清楚(重塑真的是我的克星)。

这是一个例子:

df <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a1b1c1 = c(5, 
2, 1, 4, 3, 1, 0, 1, 3), a2b1c1 = c(3, 4, 1, 1, 3, 2, 1, 4, 4
), a3b1c1 = c(4, 0, 0, 1, 1, 1, 0, 0, 1), a1b2c1 = c(1, 0, 4, 
2, 4, 1, 0, 4, 2), a2b2c1 = c(2, 0, 1, 0, 1, 0, 3, 2, 0), a3b2c1 = c(2, 
4, 3, 0, 2, 3, 3, 3, 4), yc1 = c(1, 2, 2, 1, 2, 2, 2, 1, 1), a1b1c2 = c(4, 
2, 3, 0, 4, 4, 2, 1, 4), a2b1c2 = c(3, 0, 3, 3, 4, 4, 3, 2, 2
), a3b1c2 = c(3, 1, 0, 1, 4, 0, 2, 2, 3), a1b2c2 = c(2, 2, 0, 
3, 2, 1, 4, 1, 0), a2b2c2 = c(3, 0, 2, 3, 4, 4, 4, 0, 4), a3b2c2 = c(0, 
0, 0, 2, 0, 0, 1, 4, 3), yc2 = c(2, 2, 2, 1, 2, 2, 2, 1, 1), X = c(5, 
6, 3, 7, 4, 3, 2, 3, 2)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

这就是我想要的(摘录):

     id b     c         y    a1    a2    a3     X

1     1 b1    c1        1     5     3     4     5
2     1 b2    c1        1     1     2     2     5
3     1 b1    c2        2     4     3     3     5
4     1 b2    c2        2     2     3     0     5

使用 tidyr & dplyr:

library(tidyverse)

df %>% 
  pivot_longer(cols = matches("a.b.c."), names_to = "name", values_to = "value") %>% 
  separate(name, into = c("a", "b", "c"), sep = c(2,4)) %>% 
  mutate(y = case_when(c == "c1" ~ yc1,
                       c == "c2" ~ yc2)) %>% 
  pivot_wider(names_from = a, values_from = value) %>% 
  select(id, b, c, y, a1, a2, a3, X)

首先,将所有 a/b/c 列转换为长格式并将 3 个值分隔到单独的列中。然后根据 c 的值使用 mutatecase_wheny 列合并为一列(您也可以使用 if_else 作为两个选项,但 case_when 可扩展更多值)。然后将 a 列转回宽格式并使用 select 将它们按正确的顺序排列并删除 yc1yc2 列。