如何在具有多列对的数据框中按名称将两对列与移动行连接起来
How to concatenate two pairs of columns by name with shifting rows, in a dataframe with multiple column pairs
我有这个数据框:
id a1 a2 b1 b2 c1 c2
<int> <int> <int> <int> <int> <int> <int>
1 1 83 33 55 33 85 86
2 2 37 0 60 98 51 0
3 3 97 71 85 8 44 40
4 4 51 6 43 15 55 57
5 5 28 53 62 73 70 9
df <- structure(list(id = 1:5, a1 = c(83L, 37L, 97L, 51L, 28L), a2 = c(33L,
0L, 71L, 6L, 53L), b1 = c(55L, 60L, 85L, 43L, 62L), b2 = c(33L,
98L, 8L, 15L, 73L), c1 = c(85L, 51L, 44L, 55L, 70L), c2 = c(86L,
0L, 40L, 57L, 9L)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
我想:
通过将第二列的每一行向下移动 1 并使用两列的字符命名新列,将具有相同起始字符的列合并为一列。
我想要的输出:
id a b c
<dbl> <dbl> <dbl> <dbl>
1 1 83 55 85
2 1 33 33 86
3 2 37 60 51
4 2 0 98 0
5 3 97 85 44
6 3 71 8 40
7 4 51 43 55
8 4 6 15 57
9 5 28 62 70
10 5 53 73 9
我试过使用lag
函数,但我不知道如何同时合并和移动列!
澄清一张图片:
您可以使用以下解决方案。我还修改了您的数据集并添加了一个 id
列:
library(tidyr)
df %>%
pivot_longer(!id, names_to = c(".value", NA), names_pattern = "([[:alpha:]])(\d)")
# A tibble: 10 x 4
id a b c
<int> <int> <int> <int>
1 1 83 55 85
2 1 33 33 86
3 2 37 60 51
4 2 0 98 0
5 3 97 85 44
6 3 71 8 40
7 4 51 43 55
8 4 6 15 57
9 5 28 62 70
10 5 53 73 9
我们可以pivot_longer,从名称中删除数字,然后pivot_wider和unnest
library(stringr)
library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = -id)%>%
mutate(name=str_remove(name, '[0-9]'))%>%
pivot_wider(names_from = name)%>%
unnest(everything())
# A tibble: 10 x 4
id a b c
<int> <int> <int> <int>
1 1 83 55 85
2 1 33 33 86
3 2 37 60 51
4 2 0 98 0
5 3 97 85 44
6 3 71 8 40
7 4 51 43 55
8 4 6 15 57
9 5 28 62 70
10 5 53 73 9
作为pivot_longer()
,然后pivot_wider()
更容易阅读,但@Anoushiravan R的回答更直接
library(tidyverse)
df %>%
rownames_to_column(var = "id") %>% # Add the id column
pivot_longer(-id) %>% # Make long
mutate(order = str_sub(name, -1), name = str_sub(name, 1, 1)) %>% # Breakout the name column
pivot_wider(names_from = name) %>% # Make wide again
select(-order) # Drop the ordering column
我认为 ANoushiravan 的解决方案是最简洁的方法。我们也可以为此使用 {dplyover}(免责声明):
library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover
df %>%
group_by(id) %>%
summarise(across2(ends_with("1"),
ends_with("2"),
~ c(.x,.y),
.names = "{pre}"),
)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 10 x 4
#> # Groups: id [5]
#> id a b c
#> <int> <int> <int> <int>
#> 1 1 83 55 85
#> 2 1 33 33 86
#> 3 2 37 60 51
#> 4 2 0 98 0
#> 5 3 97 85 44
#> 6 3 71 8 40
#> 7 4 51 43 55
#> 8 4 6 15 57
#> 9 5 28 62 70
#> 10 5 53 73 9
由 reprex package (v0.3.0)
于 2021-07-28 创建
我有这个数据框:
id a1 a2 b1 b2 c1 c2
<int> <int> <int> <int> <int> <int> <int>
1 1 83 33 55 33 85 86
2 2 37 0 60 98 51 0
3 3 97 71 85 8 44 40
4 4 51 6 43 15 55 57
5 5 28 53 62 73 70 9
df <- structure(list(id = 1:5, a1 = c(83L, 37L, 97L, 51L, 28L), a2 = c(33L,
0L, 71L, 6L, 53L), b1 = c(55L, 60L, 85L, 43L, 62L), b2 = c(33L,
98L, 8L, 15L, 73L), c1 = c(85L, 51L, 44L, 55L, 70L), c2 = c(86L,
0L, 40L, 57L, 9L)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
我想: 通过将第二列的每一行向下移动 1 并使用两列的字符命名新列,将具有相同起始字符的列合并为一列。
我想要的输出:
id a b c
<dbl> <dbl> <dbl> <dbl>
1 1 83 55 85
2 1 33 33 86
3 2 37 60 51
4 2 0 98 0
5 3 97 85 44
6 3 71 8 40
7 4 51 43 55
8 4 6 15 57
9 5 28 62 70
10 5 53 73 9
我试过使用lag
函数,但我不知道如何同时合并和移动列!
澄清一张图片:
您可以使用以下解决方案。我还修改了您的数据集并添加了一个 id
列:
library(tidyr)
df %>%
pivot_longer(!id, names_to = c(".value", NA), names_pattern = "([[:alpha:]])(\d)")
# A tibble: 10 x 4
id a b c
<int> <int> <int> <int>
1 1 83 55 85
2 1 33 33 86
3 2 37 60 51
4 2 0 98 0
5 3 97 85 44
6 3 71 8 40
7 4 51 43 55
8 4 6 15 57
9 5 28 62 70
10 5 53 73 9
我们可以pivot_longer,从名称中删除数字,然后pivot_wider和unnest
library(stringr)
library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = -id)%>%
mutate(name=str_remove(name, '[0-9]'))%>%
pivot_wider(names_from = name)%>%
unnest(everything())
# A tibble: 10 x 4
id a b c
<int> <int> <int> <int>
1 1 83 55 85
2 1 33 33 86
3 2 37 60 51
4 2 0 98 0
5 3 97 85 44
6 3 71 8 40
7 4 51 43 55
8 4 6 15 57
9 5 28 62 70
10 5 53 73 9
作为pivot_longer()
,然后pivot_wider()
更容易阅读,但@Anoushiravan R的回答更直接
library(tidyverse)
df %>%
rownames_to_column(var = "id") %>% # Add the id column
pivot_longer(-id) %>% # Make long
mutate(order = str_sub(name, -1), name = str_sub(name, 1, 1)) %>% # Breakout the name column
pivot_wider(names_from = name) %>% # Make wide again
select(-order) # Drop the ordering column
我认为 ANoushiravan 的解决方案是最简洁的方法。我们也可以为此使用 {dplyover}(免责声明):
library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover
df %>%
group_by(id) %>%
summarise(across2(ends_with("1"),
ends_with("2"),
~ c(.x,.y),
.names = "{pre}"),
)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 10 x 4
#> # Groups: id [5]
#> id a b c
#> <int> <int> <int> <int>
#> 1 1 83 55 85
#> 2 1 33 33 86
#> 3 2 37 60 51
#> 4 2 0 98 0
#> 5 3 97 85 44
#> 6 3 71 8 40
#> 7 4 51 43 55
#> 8 4 6 15 57
#> 9 5 28 62 70
#> 10 5 53 73 9
由 reprex package (v0.3.0)
于 2021-07-28 创建