Return 未找到每个 ID 的值 - R
Return values not found for each ID - R
我想在供应商数据框中为每个供应商识别不匹配的值。换句话说,找到不在每个供应商的供应商数据框中的国家。
我有一个如下所示的数据框(供应商):
Vendor_ID
Vendor
Country_ID
Country
1
Burger King
2
USA
1
Burger King
3
France
1
Burger King
5
Brazil
1
Burger King
7
Turkey
2
McDonald's
5
Brazil
2
McDonald's
3
France
Vendors <- data.frame (
Vendor_ID = c("1", "1", "1", "1", "2", "2"),
Vendor = c("Burger King", "Burger King", "Burger King", "Burger King", "McDonald's", "McDonald's"),
Country_ID = c("2", "3", "5", "7", "5", "3"),
Country = c("USA", "France", "Brazil", "Turkey", "Brazil", "France"))
我还有另一个数据框(国家/地区),如下所示:
Country_ID
Country
2
USA
3
France
5
Brazil
7
Turkey
Countries <- data.frame (Country_ID = c("2", "3", "5", "7"),
Country = c("USA", "France", "Brazil", "Turkey"))
期望的输出:
Vendor_ID
Vendor
Country_ID
Country
2
McDonald's
2
USA
2
McDonald's
7
Turkey
谁能告诉我这在 R 中是如何实现的?我尝试了 subset & ant-join 但结果不正确。
在Base R
中,我们可以先按供应商拆分数据
VenList <- split(df, df$Vendor)
然后我们可以检查缺少的国家和 return 它。
res <- lapply(VenList, function(x){
# Identify missing country of vendors
tmp1 <- df2[!(df2[, "Country"] %in% x[, "Country"]), ]
# get vendor and vendor ID
tmp2 <- x[1:nrow(tmp1), 1:2]
# cbind
if(nrow(tmp2) == nrow(tmp1)){
cbind(tmp2, tmp1)
}
})
# Which yields
res
# $BurgerKing
# NULL
#
# $`McDonald's`
# Vendor_ID Vendor Country_ID Country
# 5 2 McDonald's 2 USA
# 6 2 McDonald's 7 Turkey
# If you want it as one df you could then flatten to
do.call(rbind, res)
# Vendor_ID Vendor Country_ID Country
# McDonald's.5 2 McDonald's 2 USA
# McDonald's.6 2 McDonald's 7 Turkey
数据
df <- read.table(text = "1 BurgerKing 2 USA
1 BurgerKing 3 France
1 BurgerKing 5 Brazil
1 BurgerKing 7 Turkey
2 McDonald's 5 Brazil
2 McDonald's 3 France", col.names = c("Vendor_ID", "Vendor", "Country_ID", "Country"))
df2 <- read.table(text = "2 USA
3 France
5 Brazil
7 Turkey", col.names = c("Country_ID", "Country")) `
解决方案使用 expand.grid
创建所有可能的供应商 - 国家组合(假设“国家”每个国家只有一个条目)然后使用 dplyr
加入“供应商”并找到“缺失国家
编辑:最后两行 (left_joins) 只需要将 ID 列“翻译”为“文本”:
library(dplyr)
expand.grid(Vendor_ID=unique(Vendors$Vendor_ID), Country_ID=Countries$Country_ID) %>%
left_join(Vendors) %>%
filter(is.na(Vendor)) %>%
select(Vendor_ID, Country_ID) %>%
left_join(Countries) %>%
left_join(unique(Vendors[, c("Vendor_ID", "Vendor")]))
Returns
Vendor_ID Country_ID Country Vendor
1 2 2 USA McDonald's
2 2 7 Turkey McDonald's
我想在供应商数据框中为每个供应商识别不匹配的值。换句话说,找到不在每个供应商的供应商数据框中的国家。
我有一个如下所示的数据框(供应商):
Vendor_ID | Vendor | Country_ID | Country |
---|---|---|---|
1 | Burger King | 2 | USA |
1 | Burger King | 3 | France |
1 | Burger King | 5 | Brazil |
1 | Burger King | 7 | Turkey |
2 | McDonald's | 5 | Brazil |
2 | McDonald's | 3 | France |
Vendors <- data.frame (
Vendor_ID = c("1", "1", "1", "1", "2", "2"),
Vendor = c("Burger King", "Burger King", "Burger King", "Burger King", "McDonald's", "McDonald's"),
Country_ID = c("2", "3", "5", "7", "5", "3"),
Country = c("USA", "France", "Brazil", "Turkey", "Brazil", "France"))
我还有另一个数据框(国家/地区),如下所示:
Country_ID | Country |
---|---|
2 | USA |
3 | France |
5 | Brazil |
7 | Turkey |
Countries <- data.frame (Country_ID = c("2", "3", "5", "7"),
Country = c("USA", "France", "Brazil", "Turkey"))
期望的输出:
Vendor_ID | Vendor | Country_ID | Country |
---|---|---|---|
2 | McDonald's | 2 | USA |
2 | McDonald's | 7 | Turkey |
谁能告诉我这在 R 中是如何实现的?我尝试了 subset & ant-join 但结果不正确。
在Base R
中,我们可以先按供应商拆分数据
VenList <- split(df, df$Vendor)
然后我们可以检查缺少的国家和 return 它。
res <- lapply(VenList, function(x){
# Identify missing country of vendors
tmp1 <- df2[!(df2[, "Country"] %in% x[, "Country"]), ]
# get vendor and vendor ID
tmp2 <- x[1:nrow(tmp1), 1:2]
# cbind
if(nrow(tmp2) == nrow(tmp1)){
cbind(tmp2, tmp1)
}
})
# Which yields
res
# $BurgerKing
# NULL
#
# $`McDonald's`
# Vendor_ID Vendor Country_ID Country
# 5 2 McDonald's 2 USA
# 6 2 McDonald's 7 Turkey
# If you want it as one df you could then flatten to
do.call(rbind, res)
# Vendor_ID Vendor Country_ID Country
# McDonald's.5 2 McDonald's 2 USA
# McDonald's.6 2 McDonald's 7 Turkey
数据
df <- read.table(text = "1 BurgerKing 2 USA
1 BurgerKing 3 France
1 BurgerKing 5 Brazil
1 BurgerKing 7 Turkey
2 McDonald's 5 Brazil
2 McDonald's 3 France", col.names = c("Vendor_ID", "Vendor", "Country_ID", "Country"))
df2 <- read.table(text = "2 USA
3 France
5 Brazil
7 Turkey", col.names = c("Country_ID", "Country")) `
解决方案使用 expand.grid
创建所有可能的供应商 - 国家组合(假设“国家”每个国家只有一个条目)然后使用 dplyr
加入“供应商”并找到“缺失国家
编辑:最后两行 (left_joins) 只需要将 ID 列“翻译”为“文本”:
library(dplyr)
expand.grid(Vendor_ID=unique(Vendors$Vendor_ID), Country_ID=Countries$Country_ID) %>%
left_join(Vendors) %>%
filter(is.na(Vendor)) %>%
select(Vendor_ID, Country_ID) %>%
left_join(Countries) %>%
left_join(unique(Vendors[, c("Vendor_ID", "Vendor")]))
Returns
Vendor_ID Country_ID Country Vendor 1 2 2 USA McDonald's 2 2 7 Turkey McDonald's