R:如何用新字符串替换数据框中的一些旧字符串?

R: How do I substitute some old strings in a dataframe with new strings?

我有一个学生列表的数据框:

No   name      class
1    Isaac     Physics
2    Napoleon  History
3    Sigmund   Psychology
4    Ludwig    Music
5    LeBron    Sport
6    Jeff      Economy

我想给一些同学改名字,新名字在第二个dataframe中:

No   Old        New
1    Isaac      Newton
2    Sigmund    Freud
3    LeBron     James

因此学生数据将如下所示:

No   name      class
1    Newton    Physics
2    Napoleon  History
3    Freud     Psychology
4    Ludwig    Music
5    James     Sport
6    Jeff      Economy

我可以使用substitute,但是太费时间了。我想通过使用包含新名称数据库的第二个数据框来快速完成。我该怎么做?

我们可以使用连接 on 第一个和第二个数据集中的 'name' 和 'Old' 列,并将第二个数据集中的 'New' 分配给 'name'列

library(data.table)
setDT(df1)[df2, name := New, on = .(name = Old)]

-输出

 df1
   No     name      class
1:  1   Newton    Physics
2:  2 Napoleon    History
3:  3    Freud Psychology
4:  4   Ludwig      Music
5:  5    James      Sport
6:  6     Jeff    Economy

注意:使用 data.table,我们可以更有效地做到这一点


或使用coalesce

library(dplyr)
df1$name <- coalesce(setNames(df2$New, df2$Old)[df1$name], df1$name)

数据

df1 <- structure(list(No = 1:6, name = c("Isaac", "Napoleon", "Sigmund", 
"Ludwig", "LeBron", "Jeff"), class = c("Physics", "History", 
"Psychology", "Music", "Sport", "Economy")), class = "data.frame", 
row.names = c(NA, 
-6L))

df2 <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA, 
-3L))

使用tidyverse

library(tidyverse)
df$name <- recode(df$name, !!!deframe(new[c("Old","New")]))

输出

  No     name      class
1  1   Newton    Physics
2  2 Napoleon    History
3  3    Freud Psychology
4  4   Ludwig      Music
5  5    James      Sport
6  6     Jeff    Economy

工作原理

  1. deframe 会将两列数据框转换为命名向量。
  2. !!!recode 的特殊语法,用于将命名向量应用于 df$name.

注意:tidyverse是一个非常有用的数据包集合science/manipulation。这会加载几个包。 deframe 来自图书馆 tibblerecode 来自 dplyr

数据

df <- structure(list(No = 1:6, name = c("Newton", "Napoleon", "Freud", 
"Ludwig", "James", "Jeff"), class = c("Physics", "History", "Psychology", 
"Music", "Sport", "Economy")), row.names = c(NA, -6L), class = "data.frame")

new <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA, 
-3L))