Return 未找到每个 ID 的值 - R

Return values not found for each ID - R

我想在供应商数据框中为每个供应商识别不匹配的值。换句话说,找到不在每个供应商的供应商数据框中的国家。

我有一个如下所示的数据框(供应商):

Vendor_ID Vendor Country_ID Country
1 Burger King 2 USA
1 Burger King 3 France
1 Burger King 5 Brazil
1 Burger King 7 Turkey
2 McDonald's 5 Brazil
2 McDonald's 3 France
Vendors <- data.frame (
Vendor_ID  = c("1", "1", "1", "1", "2", "2"),
      Vendor = c("Burger King", "Burger King", "Burger King", "Burger King", "McDonald's", "McDonald's"),
                  Country_ID = c("2", "3", "5", "7", "5", "3"),
                  Country = c("USA", "France", "Brazil", "Turkey", "Brazil", "France"))

我还有另一个数据框(国家/地区),如下所示:

Country_ID Country
2 USA
3 France
5 Brazil
7 Turkey
Countries <- data.frame (Country_ID = c("2", "3", "5", "7"),
                        Country = c("USA", "France", "Brazil", "Turkey"))

期望的输出:

Vendor_ID Vendor Country_ID Country
2 McDonald's 2 USA
2 McDonald's 7 Turkey

谁能告诉我这在 R 中是如何实现的?我尝试了 subset & ant-join 但结果不正确。

Base R中,我们可以先按供应商拆分数据

VenList <- split(df, df$Vendor)

然后我们可以检查缺少的国家和 return 它。

res <- lapply(VenList, function(x){
  
  # Identify missing country of vendors
  tmp1 <- df2[!(df2[, "Country"] %in% x[, "Country"]), ]
  
  # get vendor and vendor ID
  tmp2 <- x[1:nrow(tmp1), 1:2]
  
  # cbind
  if(nrow(tmp2) == nrow(tmp1)){
    cbind(tmp2, tmp1)
  }
})

# Which yields

res

# $BurgerKing
# NULL
# 
# $`McDonald's`
#   Vendor_ID     Vendor Country_ID Country
# 5         2 McDonald's          2     USA
# 6         2 McDonald's          7  Turkey

# If you want it as one df you could then flatten to 

do.call(rbind, res)

#              Vendor_ID     Vendor Country_ID Country
# McDonald's.5         2 McDonald's          2     USA
# McDonald's.6         2 McDonald's          7  Turkey

数据

df <- read.table(text = "1  BurgerKing  2   USA
1   BurgerKing  3   France
1   BurgerKing  5   Brazil
1   BurgerKing  7   Turkey
2   McDonald's 5    Brazil
2   McDonald's 3    France", col.names = c("Vendor_ID", "Vendor",   "Country_ID",   "Country"))

df2 <- read.table(text = "2 USA
3   France
5   Brazil
7   Turkey", col.names = c("Country_ID",    "Country")) `

解决方案使用 expand.grid 创建所有可能的供应商 - 国家组合(假设“国家”每个国家只有一个条目)然后使用 dplyr 加入“供应商”并找到“缺失国家

编辑:最后两行 (left_joins) 只需要将 ID 列“翻译”为“文本”:

library(dplyr)

expand.grid(Vendor_ID=unique(Vendors$Vendor_ID), Country_ID=Countries$Country_ID) %>% 
  left_join(Vendors) %>% 
  filter(is.na(Vendor)) %>%
  select(Vendor_ID, Country_ID) %>% 
  left_join(Countries) %>% 
  left_join(unique(Vendors[, c("Vendor_ID", "Vendor")]))

Returns

  Vendor_ID Country_ID Country     Vendor
1         2          2     USA McDonald's
2         2          7  Turkey McDonald's