dplyr 字符串匹配和替换基于 R 中的查找 table
dplyr string match and replace based on lookup table in R
我正在尝试实现我之前在 Excel 中实现的功能,但找不到实现它的方法。
我有两个数据集:一个是我的基础数据集,另一个是查找 table。
我的基地有两列,人名和姓氏。我的查找 table 也有前两列,但它还包括替换名字。
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert")
)
我想做的是用 NewFname 替换 Fname,这取决于两个条件:Fname 和 Sname 在两个数据帧中都匹配。这是因为我有一个数据集,其中包含其他 40,000 行数据需要处理。最终,我希望得到以下数据框:
People <- data.frame(
Fname = c("Tommy","Tim","Jerry","Ben","Roderick","John","Perry","Robert"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
但是,我想要一个功能解决方案,这样我就不必单独手动输入条件和替换名称。到目前为止,我有以下(有问题的)解决方案,它涉及在 dplyr 中使用 mutate 生成一个新列,但它不起作用
People %>%
mutate(NewName = if_else(
Fname == Lookup$Fname & Sname == Lookup$Sname, NewFname, Fname
))
只需使用 left_join
,然后在 !is.na()
上使用 mutate
library(dplyr)
People %>%
left_join(Lookup, by = c("Fname", "Sname")) %>%
mutate(Fname = ifelse(!is.na(NewFname), NewFname, Fname))
# Fname Sname NewFname
# 1 Tommy Harper Tommy
# 2 Tim Kingston Tim
# 3 Jerry Ribery <NA>
# 4 Ben Ghazali <NA>
# 5 Roderick Baker Roderick
# 6 John Falcon <NA>
# 7 Perry Jefferson <NA>
# 8 Robert Lombardy Robert
我离开 NewFname
只是为了弄清楚发生了什么。
数据:
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy"), stringsAsFactors = F
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert"), stringsAsFactors = F
)
我正在尝试实现我之前在 Excel 中实现的功能,但找不到实现它的方法。
我有两个数据集:一个是我的基础数据集,另一个是查找 table。 我的基地有两列,人名和姓氏。我的查找 table 也有前两列,但它还包括替换名字。
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert")
)
我想做的是用 NewFname 替换 Fname,这取决于两个条件:Fname 和 Sname 在两个数据帧中都匹配。这是因为我有一个数据集,其中包含其他 40,000 行数据需要处理。最终,我希望得到以下数据框:
People <- data.frame(
Fname = c("Tommy","Tim","Jerry","Ben","Roderick","John","Perry","Robert"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy")
)
但是,我想要一个功能解决方案,这样我就不必单独手动输入条件和替换名称。到目前为止,我有以下(有问题的)解决方案,它涉及在 dplyr 中使用 mutate 生成一个新列,但它不起作用
People %>%
mutate(NewName = if_else(
Fname == Lookup$Fname & Sname == Lookup$Sname, NewFname, Fname
))
只需使用 left_join
,然后在 !is.na()
mutate
library(dplyr)
People %>%
left_join(Lookup, by = c("Fname", "Sname")) %>%
mutate(Fname = ifelse(!is.na(NewFname), NewFname, Fname))
# Fname Sname NewFname
# 1 Tommy Harper Tommy
# 2 Tim Kingston Tim
# 3 Jerry Ribery <NA>
# 4 Ben Ghazali <NA>
# 5 Roderick Baker Roderick
# 6 John Falcon <NA>
# 7 Perry Jefferson <NA>
# 8 Robert Lombardy Robert
我离开 NewFname
只是为了弄清楚发生了什么。
数据:
People <- data.frame(
Fname = c("Tom","Tom","Jerry","Ben","Rod","John","Perry","Rod"),
Sname = c("Harper","Kingston","Ribery","Ghazali","Baker","Falcon","Jefferson","Lombardy"), stringsAsFactors = F
)
Lookup <- data.frame(
Fname = c("Tom","Tom","Rod","Rod"),
Sname = c("Harper","Kingston","Baker","Lombardy"),
NewFname = c("Tommy","Tim","Roderick","Robert"), stringsAsFactors = F
)