将四列特殊连接到 R 中的新两列
Special join of four columns into new two ones in R
我在 R 工作,我遇到了一个有趣的问题。
我想转换下一个数据框:
DF = data.frame(ID = c(1, 2, 3),
Person1 = c("Devin Davey", "Rui Butt", "Keon Dotson"),
Sign = "artist",
Person2 = c("Eli Greer", "Alvin Simons", "Leona Ford"),
Sex = c("female", "male", "female"),
Score = c(10, 20, 30))
ID Person1 Sign Person2 Sex Score
1 1 Devin Davey artist Eli Greer female 10
2 2 Rui Butt artist Alvin Simons male 20
3 3 Keon Dotson artist Leona Ford female 30
格式如下:
ID Name Sign Score
1 1 Devin Davey artist 10
2 1 Eli Greer female 10
3 2 Rui Butt artist 20
4 2 Alvin Simons male 20
5 3 Keon Dotson artist 30
6 3 Leona Ford female 30
也就是说,有一个特殊的将四列连接成两个新列的方法。
我有如下想法:
PART1 <- DF %>%
select(ID, Person1, Person2, Score) %>%
gather(key, Name, -c(ID, Score), na.rm = TRUE) %>%
select(-key) %>%
arrange(ID) %>%
mutate(temp_id = 1:n())
PART2 <- DF %>%
select(ID, Sign, Sex) %>%
gather(key, Sign, -ID, na.rm = TRUE) %>%
select(-key) %>%
arrange(ID) %>%
mutate(temp_id = 1:n())
PART1 %>%
left_join(PART2, by = c("ID" = "ID", "temp_id" = "temp_id")) %>%
select(-temp_id) %>%
relocate(Score, .after = Sign)
但是我觉得这样的解决方案不是很漂亮,我觉得这个问题可以用更好的方式解决。
因此,如果您提出使用 tidyverse
解决此问题的想法,我将不胜感激。
我们可以将名称从 'Sign'、'Sex' 更改为通用名称 'Sign',并附加一个序列作为后缀以与 Person 匹配,然后使用 pivot_longer
library(dplyr)
library(tidyr)
DF %>%
rename_at(vars(c('Sign', 'Sex')), ~ c('Sign1', 'Sign2')) %>%
pivot_longer(cols = -c(ID, Score), names_to = c(".value", "grp"),
names_sep = "(?<=[a-z])(?=\d)") %>%
select(ID, Name = Person, Sign, Score)
-输出
# A tibble: 6 x 4
# ID Name Sign Score
# <dbl> <chr> <chr> <dbl>
#1 1 Devin Davey artist 10
#2 1 Eli Greer female 10
#3 2 Rui Butt artist 20
#4 2 Alvin Simons male 20
#5 3 Keon Dotson artist 30
#6 3 Leona Ford female 30
在基础 R 中,您可以使用函数 reshape
。由于这给出了不同的排序,我们将重新排序以获得如上所述的确切数据。虽然没有必要
DF1<-reshape(DF, matrix(2:5, 2), dir="long")
DF1[order(DF1$ID),c("ID", "Person1","Sign", "Score")]
ID Person1 Sign Score
1.1 1 Devin Davey artist 10
1.2 1 Eli Greer female 10
2.1 2 Rui Butt artist 20
2.2 2 Alvin Simons male 20
3.1 3 Keon Dotson artist 30
3.2 3 Leona Ford female 30
您可以显式 select 列名并使用 bind_rows
library(tidyverse)
bind_rows(DF %>% select(ID, Name = Person1, Sign = Sex, Score),
DF %>% select(ID, Name = Person2, Sign, Score)) %>%
arrange(ID)
#> ID Name Sign Score
#> 1 1 Devin Davey female 10
#> 2 1 Eli Greer artist 10
#> 3 2 Rui Butt male 20
#> 4 2 Alvin Simons artist 20
#> 5 3 Keon Dotson female 30
#> 6 3 Leona Ford artist 30
或full_join
library(tidyverse)
DF %>% select(ID, Name = Person1, Sign = Sex, Score) %>%
full_join(DF %>% select(ID, Name = Person2, Sign, Score)) %>%
arrange(ID)
#> Joining, by = c("ID", "Name", "Sign", "Score")
#> ID Name Sign Score
#> 1 1 Devin Davey female 10
#> 2 1 Eli Greer artist 10
#> 3 2 Rui Butt male 20
#> 4 2 Alvin Simons artist 20
#> 5 3 Keon Dotson female 30
#> 6 3 Leona Ford artist 30
我在 R 工作,我遇到了一个有趣的问题。 我想转换下一个数据框:
DF = data.frame(ID = c(1, 2, 3),
Person1 = c("Devin Davey", "Rui Butt", "Keon Dotson"),
Sign = "artist",
Person2 = c("Eli Greer", "Alvin Simons", "Leona Ford"),
Sex = c("female", "male", "female"),
Score = c(10, 20, 30))
ID Person1 Sign Person2 Sex Score
1 1 Devin Davey artist Eli Greer female 10
2 2 Rui Butt artist Alvin Simons male 20
3 3 Keon Dotson artist Leona Ford female 30
格式如下:
ID Name Sign Score
1 1 Devin Davey artist 10
2 1 Eli Greer female 10
3 2 Rui Butt artist 20
4 2 Alvin Simons male 20
5 3 Keon Dotson artist 30
6 3 Leona Ford female 30
也就是说,有一个特殊的将四列连接成两个新列的方法。
我有如下想法:
PART1 <- DF %>%
select(ID, Person1, Person2, Score) %>%
gather(key, Name, -c(ID, Score), na.rm = TRUE) %>%
select(-key) %>%
arrange(ID) %>%
mutate(temp_id = 1:n())
PART2 <- DF %>%
select(ID, Sign, Sex) %>%
gather(key, Sign, -ID, na.rm = TRUE) %>%
select(-key) %>%
arrange(ID) %>%
mutate(temp_id = 1:n())
PART1 %>%
left_join(PART2, by = c("ID" = "ID", "temp_id" = "temp_id")) %>%
select(-temp_id) %>%
relocate(Score, .after = Sign)
但是我觉得这样的解决方案不是很漂亮,我觉得这个问题可以用更好的方式解决。
因此,如果您提出使用 tidyverse
解决此问题的想法,我将不胜感激。
我们可以将名称从 'Sign'、'Sex' 更改为通用名称 'Sign',并附加一个序列作为后缀以与 Person 匹配,然后使用 pivot_longer
library(dplyr)
library(tidyr)
DF %>%
rename_at(vars(c('Sign', 'Sex')), ~ c('Sign1', 'Sign2')) %>%
pivot_longer(cols = -c(ID, Score), names_to = c(".value", "grp"),
names_sep = "(?<=[a-z])(?=\d)") %>%
select(ID, Name = Person, Sign, Score)
-输出
# A tibble: 6 x 4
# ID Name Sign Score
# <dbl> <chr> <chr> <dbl>
#1 1 Devin Davey artist 10
#2 1 Eli Greer female 10
#3 2 Rui Butt artist 20
#4 2 Alvin Simons male 20
#5 3 Keon Dotson artist 30
#6 3 Leona Ford female 30
在基础 R 中,您可以使用函数 reshape
。由于这给出了不同的排序,我们将重新排序以获得如上所述的确切数据。虽然没有必要
DF1<-reshape(DF, matrix(2:5, 2), dir="long")
DF1[order(DF1$ID),c("ID", "Person1","Sign", "Score")]
ID Person1 Sign Score
1.1 1 Devin Davey artist 10
1.2 1 Eli Greer female 10
2.1 2 Rui Butt artist 20
2.2 2 Alvin Simons male 20
3.1 3 Keon Dotson artist 30
3.2 3 Leona Ford female 30
您可以显式 select 列名并使用 bind_rows
library(tidyverse)
bind_rows(DF %>% select(ID, Name = Person1, Sign = Sex, Score),
DF %>% select(ID, Name = Person2, Sign, Score)) %>%
arrange(ID)
#> ID Name Sign Score
#> 1 1 Devin Davey female 10
#> 2 1 Eli Greer artist 10
#> 3 2 Rui Butt male 20
#> 4 2 Alvin Simons artist 20
#> 5 3 Keon Dotson female 30
#> 6 3 Leona Ford artist 30
或full_join
library(tidyverse)
DF %>% select(ID, Name = Person1, Sign = Sex, Score) %>%
full_join(DF %>% select(ID, Name = Person2, Sign, Score)) %>%
arrange(ID)
#> Joining, by = c("ID", "Name", "Sign", "Score")
#> ID Name Sign Score
#> 1 1 Devin Davey female 10
#> 2 1 Eli Greer artist 10
#> 3 2 Rui Butt male 20
#> 4 2 Alvin Simons artist 20
#> 5 3 Keon Dotson female 30
#> 6 3 Leona Ford artist 30