如何根据另一个数据集中的重复值重新编码值

Question

我正在处理以下数据。我们可以称它为x

   New_Name_List               Old_Name_List
1     bumiputera        bumiputera (muslims)
2     bumiputera          bumiputera (other)
3 non bumiputera non bumiputera (indigenous)
4        chinese                     chinese

目标是在另一个数据对象中重新编码数据，如下所示。我们可以称它为y

  EPR_country_code EPR_country           EPR_group_lower_2
1              835      Brunei        bumiputera (muslims)
2              835      Brunei          bumiputera (other)
3              835      Brunei non bumiputera (indigenous)
4              835      Brunei                     chinese

如果 x$New_Name_List 有重复值，我希望新列 y$EPR_group_lower_3 中的 x$Old_Name_List 值。

如果 x$New_Name_List 具有唯一值，我想要新列 y$EPR_group_lower_3 中的 x$New_Name_List。

以便数据最后看起来像这样：

  EPR_country_code EPR_country           EPR_group_lower_2  EPR_group_lower_3
1              835      Brunei        bumiputera (muslims)  bumiputera (muslims)
2              835      Brunei          bumiputera (other)  bumiputera (other)
3              835      Brunei non bumiputera (indigenous)  non bumiputera
4              835      Brunei                     chinese  chinese

非常感谢

Answer 1

我们可以根据 New_Name_List 中的重复值使用 Old_Name_List 或 New_Name_List 中的 ifelse 和 select 值。

y$EPR_group_lower_3 <- with(x, ifelse(duplicated(New_Name_List) | 
        duplicated(New_Name_List, fromLast = TRUE), Old_Name_List, New_Name_List))
y

#  EPR_country_code EPR_country           EPR_group_lower_2    EPR_group_lower_3
#1              835      Brunei        bumiputera (muslims) bumiputera (muslims)
#2              835      Brunei          bumiputera (other)   bumiputera (other)
#3              835      Brunei non bumiputera (indigenous)       non bumiputera
#4              835      Brunei                     chinese              chinese

或者找到值重复的索引并仅替换那些。

y$EPR_group_lower_3 <- x$New_Name_List
inds <- with(x, duplicated(New_Name_List) | duplicated(New_Name_List, fromLast = TRUE))
y$EPR_group_lower_3[inds] <- x$Old_Name_List[inds]

数据

x <- structure(list(New_Name_List = c("bumiputera", "bumiputera", 
"non bumiputera", "chinese"), Old_Name_List = c("bumiputera (muslims)", 
"bumiputera (other)", "non bumiputera (indigenous)", "chinese"
)), class = "data.frame", row.names = c(NA, -4L))

y <- structure(list(EPR_country_code = c(835L, 835L, 835L, 835L), 
EPR_country = c("Brunei", "Brunei", "Brunei", "Brunei"), 
EPR_group_lower_2 = c("bumiputera (muslims)", "bumiputera (other)", 
"non bumiputera (indigenous)", "chinese")), class = "data.frame", 
row.names = c(NA, -4L))

如何根据另一个数据集中的重复值重新编码值

How to recode values based on duplicate values in another dataset

r

recode