使用 dplyr 将来自两个特定行的值连接为一个新行,并以分号分隔值
Join values from two specific rows as a new row and values separated by semicolon using dplyr
类似于unite()
对列的操作,对于用分号分隔值的特定行,是否可以跨列合并行?
在下面的示例中,IC_1
和 IC_2
合并为一个新行,括号中的值由 ;
分隔
structure(list(treatment = c("product", "product", "product",
"product", "control", "control", "control", "control"), variable = c("A",
"B", "IC_1", "IC_2", "A", "B", "IC_1", "IC_2"), X1 = 1:8, X2 = 8:15,
X3 = 16:23), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L))
treatment variable X1 X2 X3
<chr> <chr> <int> <int> <int>
1 product A 1 8 16
2 product B 2 9 17
3 product IC_1 3 10 18
4 product IC_2 4 11 19
5 control A 5 12 20
6 control B 6 13 21
7 control IC_1 7 14 22
8 control IC_2 8 15 23
期望的输出:
treatment variable X1 X2 X3
<chr> <chr> <chr> <chr> <chr>
1 product A 1 8 16
2 product B 2 9 17
3 product IC [3;4][10;11][18:19]
4 control A 5 12 20
5 control B 6 13 21
6 control IC [7;8][14;15][22;23]
您可以检查变量是否包含“IC”并对其进行分组,然后使用 paste
将“IC”值粘合在一起。但是请注意,这会将列 X1:X3
更改为字符数据。
library(tidyverse)
df %>%
group_by(treatment, variable = ifelse(grepl('IC', variable), 'IC', variable)) %>%
summarize(across(X1:X3, ~ifelse(length(.x) == 1, as.character(.x), paste(.x, collapse = ';')))) %>%
mutate(across(X1:X3, ~ifelse(grepl(';', .x), sprintf('[%s]', .x), .x)))
treatment variable X1 X2 X3
<chr> <chr> <chr> <chr> <chr>
1 control A 5 12 20
2 control B 6 13 21
3 control IC [7;8] [14;15] [22;23]
4 product A 1 8 16
5 product B 2 9 17
6 product IC [3;4] [10;11] [18;19]
这是另一种 dplyr
解决方案:
首先我们删除 _1
和 _2
以便能够创建组。
那我们group_by
并在以 X
开头的列中应用 ifelse
语句
后记做一些数据争论。
library(dplyr)
library(stringr)
df %>%
mutate(row = row_number()) %>%
mutate(variable = str_remove(variable, '\_\d')) %>%
group_by(treatment, variable) %>%
mutate(across(starts_with("X"), ~ifelse(
variable == "IC", paste0("[", ., ";",lead(.), "]"),as.character(.)))
) %>%
slice(1) %>%
arrange(row) %>%
select(-row)
treatment variable X1 X2 X3
<chr> <chr> <chr> <chr> <chr>
1 product A 1 8 16
2 product B 2 9 17
3 product IC [3;4] [10;11] [18;19]
4 control A 5 12 20
5 control B 6 13 21
6 control IC [7;8] [14;15] [22;23]
类似于unite()
对列的操作,对于用分号分隔值的特定行,是否可以跨列合并行?
在下面的示例中,IC_1
和 IC_2
合并为一个新行,括号中的值由 ;
structure(list(treatment = c("product", "product", "product",
"product", "control", "control", "control", "control"), variable = c("A",
"B", "IC_1", "IC_2", "A", "B", "IC_1", "IC_2"), X1 = 1:8, X2 = 8:15,
X3 = 16:23), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L))
treatment variable X1 X2 X3
<chr> <chr> <int> <int> <int>
1 product A 1 8 16
2 product B 2 9 17
3 product IC_1 3 10 18
4 product IC_2 4 11 19
5 control A 5 12 20
6 control B 6 13 21
7 control IC_1 7 14 22
8 control IC_2 8 15 23
期望的输出:
treatment variable X1 X2 X3
<chr> <chr> <chr> <chr> <chr>
1 product A 1 8 16
2 product B 2 9 17
3 product IC [3;4][10;11][18:19]
4 control A 5 12 20
5 control B 6 13 21
6 control IC [7;8][14;15][22;23]
您可以检查变量是否包含“IC”并对其进行分组,然后使用 paste
将“IC”值粘合在一起。但是请注意,这会将列 X1:X3
更改为字符数据。
library(tidyverse)
df %>%
group_by(treatment, variable = ifelse(grepl('IC', variable), 'IC', variable)) %>%
summarize(across(X1:X3, ~ifelse(length(.x) == 1, as.character(.x), paste(.x, collapse = ';')))) %>%
mutate(across(X1:X3, ~ifelse(grepl(';', .x), sprintf('[%s]', .x), .x)))
treatment variable X1 X2 X3
<chr> <chr> <chr> <chr> <chr>
1 control A 5 12 20
2 control B 6 13 21
3 control IC [7;8] [14;15] [22;23]
4 product A 1 8 16
5 product B 2 9 17
6 product IC [3;4] [10;11] [18;19]
这是另一种 dplyr
解决方案:
首先我们删除 _1
和 _2
以便能够创建组。
那我们group_by
并在以 X
开头的列中应用 ifelse
语句
后记做一些数据争论。
library(dplyr)
library(stringr)
df %>%
mutate(row = row_number()) %>%
mutate(variable = str_remove(variable, '\_\d')) %>%
group_by(treatment, variable) %>%
mutate(across(starts_with("X"), ~ifelse(
variable == "IC", paste0("[", ., ";",lead(.), "]"),as.character(.)))
) %>%
slice(1) %>%
arrange(row) %>%
select(-row)
treatment variable X1 X2 X3
<chr> <chr> <chr> <chr> <chr>
1 product A 1 8 16
2 product B 2 9 17
3 product IC [3;4] [10;11] [18;19]
4 control A 5 12 20
5 control B 6 13 21
6 control IC [7;8] [14;15] [22;23]