如何在具有多列对的数据框中按名称将两对列与移动行连接起来

How to concatenate two pairs of columns by name with shifting rows, in a dataframe with multiple column pairs

我有这个数据框:

     id    a1    a2    b1    b2    c1    c2
  <int> <int> <int> <int> <int> <int> <int>
1     1    83    33    55    33    85    86
2     2    37     0    60    98    51     0
3     3    97    71    85     8    44    40
4     4    51     6    43    15    55    57
5     5    28    53    62    73    70     9
df <- structure(list(id = 1:5, a1 = c(83L, 37L, 97L, 51L, 28L), a2 = c(33L, 
0L, 71L, 6L, 53L), b1 = c(55L, 60L, 85L, 43L, 62L), b2 = c(33L, 
98L, 8L, 15L, 73L), c1 = c(85L, 51L, 44L, 55L, 70L), c2 = c(86L, 
0L, 40L, 57L, 9L)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

我想: 通过将第二列的每一行向下移动 1 并使用两列的字符命名新列,将具有相同起始字符的列合并为一列。

我想要的输出:

      id     a     b     c
   <dbl> <dbl> <dbl> <dbl>
 1     1    83    55    85
 2     1    33    33    86
 3     2    37    60    51
 4     2     0    98     0
 5     3    97    85    44
 6     3    71     8    40
 7     4    51    43    55
 8     4     6    15    57
 9     5    28    62    70
10     5    53    73     9

我试过使用lag函数,但我不知道如何同时合并和移动列!

澄清一张图片:

您可以使用以下解决方案。我还修改了您的数据集并添加了一个 id 列:

library(tidyr)

df %>%
  pivot_longer(!id, names_to = c(".value", NA), names_pattern = "([[:alpha:]])(\d)")

# A tibble: 10 x 4
      id     a     b     c
   <int> <int> <int> <int>
 1     1    83    55    85
 2     1    33    33    86
 3     2    37    60    51
 4     2     0    98     0
 5     3    97    85    44
 6     3    71     8    40
 7     4    51    43    55
 8     4     6    15    57
 9     5    28    62    70
10     5    53    73     9

我们可以pivot_longer,从名称中删除数字,然后pivot_wider和unnest

library(stringr)
library(dplyr)
library(tidyr)

df %>% pivot_longer(cols = -id)%>%
        mutate(name=str_remove(name, '[0-9]'))%>%
        pivot_wider(names_from = name)%>%
        unnest(everything())

# A tibble: 10 x 4
      id     a     b     c
   <int> <int> <int> <int>
 1     1    83    55    85
 2     1    33    33    86
 3     2    37    60    51
 4     2     0    98     0
 5     3    97    85    44
 6     3    71     8    40
 7     4    51    43    55
 8     4     6    15    57
 9     5    28    62    70
10     5    53    73     9

作为pivot_longer(),然后pivot_wider()更容易阅读,但@Anoushiravan R的回答更直接

library(tidyverse)

df %>% 
  rownames_to_column(var = "id") %>% # Add the id column
  pivot_longer(-id) %>% # Make long
  mutate(order = str_sub(name, -1), name = str_sub(name, 1, 1)) %>% # Breakout the name column
  pivot_wider(names_from = name) %>% # Make wide again
  select(-order) # Drop the ordering column

我认为 ANoushiravan 的解决方案是最简洁的方法。我们也可以为此使用 {dplyover}(免责声明):

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover

df %>% 
  group_by(id) %>% 
  summarise(across2(ends_with("1"),
                    ends_with("2"),
                    ~ c(.x,.y),
                    .names = "{pre}"),
            )
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.

#> # A tibble: 10 x 4
#> # Groups:   id [5]
#>       id     a     b     c
#>    <int> <int> <int> <int>
#>  1     1    83    55    85
#>  2     1    33    33    86
#>  3     2    37    60    51
#>  4     2     0    98     0
#>  5     3    97    85    44
#>  6     3    71     8    40
#>  7     4    51    43    55
#>  8     4     6    15    57
#>  9     5    28    62    70
#> 10     5    53    73     9

reprex package (v0.3.0)

于 2021-07-28 创建