R:如何用新字符串替换数据框中的一些旧字符串?
R: How do I substitute some old strings in a dataframe with new strings?
我有一个学生列表的数据框:
No name class
1 Isaac Physics
2 Napoleon History
3 Sigmund Psychology
4 Ludwig Music
5 LeBron Sport
6 Jeff Economy
我想给一些同学改名字,新名字在第二个dataframe中:
No Old New
1 Isaac Newton
2 Sigmund Freud
3 LeBron James
因此学生数据将如下所示:
No name class
1 Newton Physics
2 Napoleon History
3 Freud Psychology
4 Ludwig Music
5 James Sport
6 Jeff Economy
我可以使用substitute
,但是太费时间了。我想通过使用包含新名称数据库的第二个数据框来快速完成。我该怎么做?
我们可以使用连接 on
第一个和第二个数据集中的 'name' 和 'Old' 列,并将第二个数据集中的 'New' 分配给 'name'列
library(data.table)
setDT(df1)[df2, name := New, on = .(name = Old)]
-输出
df1
No name class
1: 1 Newton Physics
2: 2 Napoleon History
3: 3 Freud Psychology
4: 4 Ludwig Music
5: 5 James Sport
6: 6 Jeff Economy
注意:使用 data.table
,我们可以更有效地做到这一点
或使用coalesce
library(dplyr)
df1$name <- coalesce(setNames(df2$New, df2$Old)[df1$name], df1$name)
数据
df1 <- structure(list(No = 1:6, name = c("Isaac", "Napoleon", "Sigmund",
"Ludwig", "LeBron", "Jeff"), class = c("Physics", "History",
"Psychology", "Music", "Sport", "Economy")), class = "data.frame",
row.names = c(NA,
-6L))
df2 <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA,
-3L))
使用tidyverse
:
library(tidyverse)
df$name <- recode(df$name, !!!deframe(new[c("Old","New")]))
输出
No name class
1 1 Newton Physics
2 2 Napoleon History
3 3 Freud Psychology
4 4 Ludwig Music
5 5 James Sport
6 6 Jeff Economy
工作原理
deframe
会将两列数据框转换为命名向量。
!!!
是 recode
的特殊语法,用于将命名向量应用于 df$name
.
注意:tidyverse
是一个非常有用的数据包集合science/manipulation。这会加载几个包。 deframe
来自图书馆 tibble
,recode
来自 dplyr
。
数据
df <- structure(list(No = 1:6, name = c("Newton", "Napoleon", "Freud",
"Ludwig", "James", "Jeff"), class = c("Physics", "History", "Psychology",
"Music", "Sport", "Economy")), row.names = c(NA, -6L), class = "data.frame")
new <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA,
-3L))
我有一个学生列表的数据框:
No name class
1 Isaac Physics
2 Napoleon History
3 Sigmund Psychology
4 Ludwig Music
5 LeBron Sport
6 Jeff Economy
我想给一些同学改名字,新名字在第二个dataframe中:
No Old New
1 Isaac Newton
2 Sigmund Freud
3 LeBron James
因此学生数据将如下所示:
No name class
1 Newton Physics
2 Napoleon History
3 Freud Psychology
4 Ludwig Music
5 James Sport
6 Jeff Economy
我可以使用substitute
,但是太费时间了。我想通过使用包含新名称数据库的第二个数据框来快速完成。我该怎么做?
我们可以使用连接 on
第一个和第二个数据集中的 'name' 和 'Old' 列,并将第二个数据集中的 'New' 分配给 'name'列
library(data.table)
setDT(df1)[df2, name := New, on = .(name = Old)]
-输出
df1
No name class
1: 1 Newton Physics
2: 2 Napoleon History
3: 3 Freud Psychology
4: 4 Ludwig Music
5: 5 James Sport
6: 6 Jeff Economy
注意:使用 data.table
,我们可以更有效地做到这一点
或使用coalesce
library(dplyr)
df1$name <- coalesce(setNames(df2$New, df2$Old)[df1$name], df1$name)
数据
df1 <- structure(list(No = 1:6, name = c("Isaac", "Napoleon", "Sigmund",
"Ludwig", "LeBron", "Jeff"), class = c("Physics", "History",
"Psychology", "Music", "Sport", "Economy")), class = "data.frame",
row.names = c(NA,
-6L))
df2 <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA,
-3L))
使用tidyverse
:
library(tidyverse)
df$name <- recode(df$name, !!!deframe(new[c("Old","New")]))
输出
No name class
1 1 Newton Physics
2 2 Napoleon History
3 3 Freud Psychology
4 4 Ludwig Music
5 5 James Sport
6 6 Jeff Economy
工作原理
deframe
会将两列数据框转换为命名向量。!!!
是recode
的特殊语法,用于将命名向量应用于df$name
.
注意:tidyverse
是一个非常有用的数据包集合science/manipulation。这会加载几个包。 deframe
来自图书馆 tibble
,recode
来自 dplyr
。
数据
df <- structure(list(No = 1:6, name = c("Newton", "Napoleon", "Freud",
"Ludwig", "James", "Jeff"), class = c("Physics", "History", "Psychology",
"Music", "Sport", "Economy")), row.names = c(NA, -6L), class = "data.frame")
new <- structure(list(No = 1:3, Old = c("Isaac", "Sigmund", "LeBron"
), New = c("Newton", "Freud", "James")), class = "data.frame", row.names = c(NA,
-3L))