使用 dplyr 将来自两个特定行的值连接为一个新行,并以分号分隔值

Join values from two specific rows as a new row and values separated by semicolon using dplyr

类似于unite()对列的操作,对于用分号分隔值的特定行,是否可以跨列合并行?

在下面的示例中,IC_1IC_2 合并为一个新行,括号中的值由 ;

分隔
structure(list(treatment = c("product", "product", "product", 
"product", "control", "control", "control", "control"), variable = c("A", 
"B", "IC_1", "IC_2", "A", "B", "IC_1", "IC_2"), X1 = 1:8, X2 = 8:15, 
    X3 = 16:23), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

 treatment variable    X1    X2    X3
  <chr>     <chr>    <int> <int> <int>
1 product   A            1     8    16
2 product   B            2     9    17
3 product   IC_1         3    10    18
4 product   IC_2         4    11    19
5 control   A            5    12    20
6 control   B            6    13    21
7 control   IC_1         7    14    22
8 control   IC_2         8    15    23 

期望的输出:

treatment variable X1    X2    X3   
  <chr>     <chr>    <chr> <chr> <chr>
1 product   A        1     8     16   
2 product   B        2     9     17   
3 product   IC     [3;4][10;11][18:19]
4 control   A        5     12    20   
5 control   B        6     13    21   
6 control   IC     [7;8][14;15][22;23]

您可以检查变量是否包含“IC”并对其进行分组,然后使用 paste 将“IC”值粘合在一起。但是请注意,这会将列 X1:X3 更改为字符数据。

library(tidyverse)

df %>% 
  group_by(treatment, variable = ifelse(grepl('IC', variable), 'IC', variable)) %>% 
  summarize(across(X1:X3, ~ifelse(length(.x) == 1, as.character(.x), paste(.x, collapse = ';')))) %>% 
  mutate(across(X1:X3, ~ifelse(grepl(';', .x), sprintf('[%s]', .x), .x)))

  treatment variable X1    X2      X3     
  <chr>     <chr>    <chr> <chr>   <chr>  
1 control   A        5     12      20     
2 control   B        6     13      21     
3 control   IC       [7;8] [14;15] [22;23]
4 product   A        1     8       16     
5 product   B        2     9       17     
6 product   IC       [3;4] [10;11] [18;19]

这是另一种 dplyr 解决方案: 首先我们删除 _1_2 以便能够创建组。 那我们group_by 并在以 X 开头的列中应用 ifelse 语句 后记做一些数据争论。

library(dplyr)
library(stringr)

df %>% 
  mutate(row = row_number()) %>% 
  mutate(variable = str_remove(variable, '\_\d')) %>% 
  group_by(treatment, variable) %>% 
  mutate(across(starts_with("X"), ~ifelse(
    variable == "IC", paste0("[", ., ";",lead(.), "]"),as.character(.)))
    ) %>% 
  slice(1) %>% 
  arrange(row) %>% 
  select(-row)
  treatment variable X1    X2      X3     
  <chr>     <chr>    <chr> <chr>   <chr>  
1 product   A        1     8       16     
2 product   B        2     9       17     
3 product   IC       [3;4] [10;11] [18;19]
4 control   A        5     12      20     
5 control   B        6     13      21     
6 control   IC       [7;8] [14;15] [22;23]