在 R 中查找和替换文本
Finding and replacing text in R
最近,我开始学习 R 并尝试通过自动化流程来探索更多。下面是示例数据,我正在尝试通过查找和替换标签 (colname:Designations) 中的特定文本来创建新列。
因为我正在处理大量新数据,所以我想使用 R 编程来自动化,而不是使用 excel 公式。
数据集:
strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager","Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")
我使用的R代码:
t<-data.frame(strings,stringsAsFactors = FALSE)
colnames(t)[1]<-"Designations"
y<-sub(".*Manager*","Manager",strings,ignore.case = TRUE)
挑战:
在此所有数据都作为经理进行了更改,但我需要用主要主题替换其他名称。
我尝试了 ifelse 语句、grep、grepl、str、sub 等,但我没有得到我要找的东西
我不能使用 first/second/last 个词(如“分隔符”),因为主要主题来回分散。例如:首席信息官或商业财务经理或 AGM
Excel 工作:
我已经将 300 个主要主题编码为...
经理(适用于所有 GM、Asst.Manager、销售经理等)
架构师(Solution Arch、Sr. Arch 等)
总监(高级总监、总监、Asst.Director等)
高级分析师
分析师
主管(销售主管)
我在找什么:
我需要创建一个新专栏,并且应该用相关的主题替换文本,就像我在 excel using R.
中所做的那样
如果我可以使用我已经在 excel 中编码的主题来匹配使用 R 编程的主题(如 excel 中的 vlookup),我没问题。
预期结果:
enter image description here
预先感谢您的帮助!
是的,和我期待的完全一样。谢谢!!但是当我通过上传新数据集(excel 文件)并使用
尝试相同的方法时
df %>%
mutate(theme=gsub(".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|Consultant|CFO|CTO|CEO|CMO|CDO|CIO|COO|Cheif Executive Officer|Chief Technological Officer|Chief Digital Officer|Chief Financial Officer|Chief Marketing Officer|Chief Digital Officer|Chief Information Officer,Chief Operations Officer)).*","\1",Designations,ignore.case = TRUE))
没用。我应该在其他地方更正吗?
你的意思是这样的吗?
library(dplyr)
strings <-
c(
"Zonal Manager",
"Department Manager",
"Network Manager",
"Head of Sales",
"Account Manager",
"Alliance Manager",
"Additional Manager",
"Senior Vice President",
"General manager",
"Senior Analyst",
"Solution Architect",
"AGM"
)
df = data.frame(Designations = strings)
df %>%
mutate(
theme = gsub(
".*(manager|head|analyst|architect|agm|director|president).*",
"\1",
Designations,
ignore.case = TRUE
)
)
#> Designations theme
#> 1 Zonal Manager Manager
#> 2 Department Manager Manager
#> 3 Network Manager Manager
#> 4 Head of Sales Head
#> 5 Account Manager Manager
#> 6 Alliance Manager Manager
#> 7 Additional Manager Manager
#> 8 Senior Vice President President
#> 9 General manager manager
#> 10 Senior Analyst Analyst
#> 11 Solution Architect Architect
#> 12 AGM AGM
由 reprex package (v0.2.1)
于 2018-10-04 创建
数据:
strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager",
"Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")
你需要好好准备一下table:(你完成了,让它完美。)
lu_table <- data.frame(new = c("Manager", "Architect","Director"), old = c("Manager|GM","Architect|Arch","Director"), stringsAsFactors = F)
那你可以让mapply来做这个工作:
mapply(function(new,old) {ans <- strings; ans[grepl(old,ans)]<-new; strings <<- ans; return(NULL)}, new = lu_table$new, old = lu_table$old)
现在看strings
:
> strings
[1] "Manager" "Manager" "Manager" "Head of Sales" "Manager" "Manager"
[7] "Manager" "Senior Vice President" "General manager" "Senior Analyst" "Architect" "Manager"
请注意:
此解决方案使用 <<-
。所以这可能不是最好的解决方案。但在这种情况下有效。
最近,我开始学习 R 并尝试通过自动化流程来探索更多。下面是示例数据,我正在尝试通过查找和替换标签 (colname:Designations) 中的特定文本来创建新列。
因为我正在处理大量新数据,所以我想使用 R 编程来自动化,而不是使用 excel 公式。
数据集:
strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager","Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")
我使用的R代码:
t<-data.frame(strings,stringsAsFactors = FALSE)
colnames(t)[1]<-"Designations"
y<-sub(".*Manager*","Manager",strings,ignore.case = TRUE)
挑战:
在此所有数据都作为经理进行了更改,但我需要用主要主题替换其他名称。
我尝试了 ifelse 语句、grep、grepl、str、sub 等,但我没有得到我要找的东西
我不能使用 first/second/last 个词(如“分隔符”),因为主要主题来回分散。例如:首席信息官或商业财务经理或 AGM
Excel 工作:
我已经将 300 个主要主题编码为...
经理(适用于所有 GM、Asst.Manager、销售经理等) 架构师(Solution Arch、Sr. Arch 等) 总监(高级总监、总监、Asst.Director等) 高级分析师 分析师 主管(销售主管)
我在找什么: 我需要创建一个新专栏,并且应该用相关的主题替换文本,就像我在 excel using R.
中所做的那样如果我可以使用我已经在 excel 中编码的主题来匹配使用 R 编程的主题(如 excel 中的 vlookup),我没问题。
预期结果: enter image description here 预先感谢您的帮助!
是的,和我期待的完全一样。谢谢!!但是当我通过上传新数据集(excel 文件)并使用
尝试相同的方法时df %>%
mutate(theme=gsub(".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|Consultant|CFO|CTO|CEO|CMO|CDO|CIO|COO|Cheif Executive Officer|Chief Technological Officer|Chief Digital Officer|Chief Financial Officer|Chief Marketing Officer|Chief Digital Officer|Chief Information Officer,Chief Operations Officer)).*","\1",Designations,ignore.case = TRUE))
没用。我应该在其他地方更正吗?
你的意思是这样的吗?
library(dplyr)
strings <-
c(
"Zonal Manager",
"Department Manager",
"Network Manager",
"Head of Sales",
"Account Manager",
"Alliance Manager",
"Additional Manager",
"Senior Vice President",
"General manager",
"Senior Analyst",
"Solution Architect",
"AGM"
)
df = data.frame(Designations = strings)
df %>%
mutate(
theme = gsub(
".*(manager|head|analyst|architect|agm|director|president).*",
"\1",
Designations,
ignore.case = TRUE
)
)
#> Designations theme
#> 1 Zonal Manager Manager
#> 2 Department Manager Manager
#> 3 Network Manager Manager
#> 4 Head of Sales Head
#> 5 Account Manager Manager
#> 6 Alliance Manager Manager
#> 7 Additional Manager Manager
#> 8 Senior Vice President President
#> 9 General manager manager
#> 10 Senior Analyst Analyst
#> 11 Solution Architect Architect
#> 12 AGM AGM
由 reprex package (v0.2.1)
于 2018-10-04 创建数据:
strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager",
"Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")
你需要好好准备一下table:(你完成了,让它完美。)
lu_table <- data.frame(new = c("Manager", "Architect","Director"), old = c("Manager|GM","Architect|Arch","Director"), stringsAsFactors = F)
那你可以让mapply来做这个工作:
mapply(function(new,old) {ans <- strings; ans[grepl(old,ans)]<-new; strings <<- ans; return(NULL)}, new = lu_table$new, old = lu_table$old)
现在看strings
:
> strings
[1] "Manager" "Manager" "Manager" "Head of Sales" "Manager" "Manager"
[7] "Manager" "Senior Vice President" "General manager" "Senior Analyst" "Architect" "Manager"
请注意:
此解决方案使用 <<-
。所以这可能不是最好的解决方案。但在这种情况下有效。