tidyr 跨多个列加入一个 ID table 和 main table
tidyr join an ID table with main table across multiple columns
这似乎是一个非常基本的操作,但我的搜索没有找到简单的解决方案。
作为我正在尝试做的事情的示例,请考虑数据库中的以下两个数据框。
首先是一个 ID table,它将索引分配给颜色名称:
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
ColorID
# A tibble: 4 x 2
ID Name
<int> <chr>
1 1 Red
2 2 Green
3 3 Blue
4 4 Black
接下来一些 table 指向这些颜色索引(而不是存储文本字符串):
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
Widgets
# A tibble: 6 x 4
Front Back Top Bottom
<dbl> <dbl> <dbl> <dbl>
1 1 4 4 1
2 3 4 3 2
3 4 3 2 3
4 2 3 1 4
5 1 1 2 3
6 1 2 3 2
现在我只想加入两个 table 以用实际颜色名称替换索引值,所以我想要的是:
Joined <- tibble(Front = c("Red", "Blue", "Black", "Green", "Red","Red"),
Back = c("Black", "Black", "Blue","Blue", "Red", "Green"),
Top = c("Black","Blue", "Green", "Red", "Green", "Blue"),
Bottom = c("Red", "Green", "Blue", "Black", "Blue","Green"))
Joined
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
我试了很多次都没有成功,我认为可行的是:
J <- Widgets %>% inner_join(ColorID, by = c(. = "ID"))
我可以通过一次使用一个变量逐列处理这一列,例如
J <- Widgets %>% inner_join(ColorID, by = c("Front" = "ID"))
它不会替换“Front”,而是创建一个新的“Name”列。似乎必须有一个简单的解决方案来解决这个问题。谢谢。
这个有用吗:
library(dplyr)
library(tidyr)
Widgets %>% pivot_longer(everything()) %>%
inner_join(ColorID, by = c('value' = 'ID')) %>% select(-value) %>%
pivot_wider(names_from = name, values_from = Name) %>% unnest(everything())
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
不需要连接函数:
library(dplyr)
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
# reorder so that row number and ID are different
ColorID <- ColorID[c(2, 1, 4, 3), ]
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
check_id <- function(col){
ColorID$Name[match(col, ColorID$ID)]
}
Widgets %>%
mutate(across(everything(), check_id))
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
(已编辑)我对 dplyr 和 mutate 所做的是将 Widgets
上的数字与 ColorID$ID
列上的数字相匹配。这为我提供了提取名称所需的 ColorID 数据框上的行。
这似乎是一个非常基本的操作,但我的搜索没有找到简单的解决方案。 作为我正在尝试做的事情的示例,请考虑数据库中的以下两个数据框。 首先是一个 ID table,它将索引分配给颜色名称:
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
ColorID
# A tibble: 4 x 2
ID Name
<int> <chr>
1 1 Red
2 2 Green
3 3 Blue
4 4 Black
接下来一些 table 指向这些颜色索引(而不是存储文本字符串):
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
Widgets
# A tibble: 6 x 4
Front Back Top Bottom
<dbl> <dbl> <dbl> <dbl>
1 1 4 4 1
2 3 4 3 2
3 4 3 2 3
4 2 3 1 4
5 1 1 2 3
6 1 2 3 2
现在我只想加入两个 table 以用实际颜色名称替换索引值,所以我想要的是:
Joined <- tibble(Front = c("Red", "Blue", "Black", "Green", "Red","Red"),
Back = c("Black", "Black", "Blue","Blue", "Red", "Green"),
Top = c("Black","Blue", "Green", "Red", "Green", "Blue"),
Bottom = c("Red", "Green", "Blue", "Black", "Blue","Green"))
Joined
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
我试了很多次都没有成功,我认为可行的是:
J <- Widgets %>% inner_join(ColorID, by = c(. = "ID"))
我可以通过一次使用一个变量逐列处理这一列,例如
J <- Widgets %>% inner_join(ColorID, by = c("Front" = "ID"))
它不会替换“Front”,而是创建一个新的“Name”列。似乎必须有一个简单的解决方案来解决这个问题。谢谢。
这个有用吗:
library(dplyr)
library(tidyr)
Widgets %>% pivot_longer(everything()) %>%
inner_join(ColorID, by = c('value' = 'ID')) %>% select(-value) %>%
pivot_wider(names_from = name, values_from = Name) %>% unnest(everything())
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
不需要连接函数:
library(dplyr)
ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
# reorder so that row number and ID are different
ColorID <- ColorID[c(2, 1, 4, 3), ]
Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2),
Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
check_id <- function(col){
ColorID$Name[match(col, ColorID$ID)]
}
Widgets %>%
mutate(across(everything(), check_id))
# A tibble: 6 x 4
Front Back Top Bottom
<chr> <chr> <chr> <chr>
1 Red Black Black Red
2 Blue Black Blue Green
3 Black Blue Green Blue
4 Green Blue Red Black
5 Red Red Green Blue
6 Red Green Blue Green
(已编辑)我对 dplyr 和 mutate 所做的是将 Widgets
上的数字与 ColorID$ID
列上的数字相匹配。这为我提供了提取名称所需的 ColorID 数据框上的行。