如何在 R 中使用 gsub() 函数替换 '+'
How to replace '+' using gsub() function in R
我正在尝试删除数据框的一个字符串元素中的“+”字符。但是我找不到解决办法。
下面是数据框。
txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L,
5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment",
"poli+tician", "politician"), class = "factor")), .Names = c("ID",
"Var1"), class = "data.frame", row.names = c(NA, -9L))
# ID Var1
# 1 government
# 2 government
# 3 government
# 4 government
# 5 poli+tician
# 6 politician
# 7 politician
# 8 parliament
# 9 parliment
我试了两种方法,都没有达到预期的效果:
方式 1
txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia" "oliiia"
# [8] "arliame" "arlime"
我不明白这里出了什么问题。我希望第 5 个元素的“+”字符被替换为没有值,但所有元素都按上述方式编辑。
方式2
txtdf<-gsub("*//+","",txtdf)
# [1] "government" "government" "government" "government" "poli+tician"
# [6] "politician" "politician" "parliament" "parliment"
这里完全没有变化。我想我已经尝试过的是,我尝试使用双斜杠转义 + 字符。
只需将其替换为 fixed = TRUE
(无需使用正则表达式),但您必须通过指定列名对 data.frame 的每个 "column" 进行替换:
txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf
给予
job
1 government
2 poli+tician
3 parliament
现在替换“+”:
txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf
结果是:
job
1 government
2 politician
3 parliament
您需要转义加号,“+”在正则表达式中有特殊含义(它是量词),因此不能被视为标点符号,来自文档:?regex
"+" The preceding item will be matched one or more times.
要匹配这些特殊字符,您需要对它们进行转义,以便可以从字面上理解它们的含义,因此不会翻译它们的特殊含义。在 R 中,您需要两个反斜杠 (\) 来转义。所以在你的情况下,这将是这样的:
gsub("\+","",df$job)
上面的 运行 将通过从数据中删除所有加号来为您提供所需的结果。
所以假设你的 df 是:
df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))
那么你的输出将是:
> gsub("\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"
我正在尝试删除数据框的一个字符串元素中的“+”字符。但是我找不到解决办法。
下面是数据框。
txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L,
5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment",
"poli+tician", "politician"), class = "factor")), .Names = c("ID",
"Var1"), class = "data.frame", row.names = c(NA, -9L))
# ID Var1
# 1 government
# 2 government
# 3 government
# 4 government
# 5 poli+tician
# 6 politician
# 7 politician
# 8 parliament
# 9 parliment
我试了两种方法,都没有达到预期的效果:
方式 1
txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia" "oliiia"
# [8] "arliame" "arlime"
我不明白这里出了什么问题。我希望第 5 个元素的“+”字符被替换为没有值,但所有元素都按上述方式编辑。
方式2
txtdf<-gsub("*//+","",txtdf)
# [1] "government" "government" "government" "government" "poli+tician"
# [6] "politician" "politician" "parliament" "parliment"
这里完全没有变化。我想我已经尝试过的是,我尝试使用双斜杠转义 + 字符。
只需将其替换为 fixed = TRUE
(无需使用正则表达式),但您必须通过指定列名对 data.frame 的每个 "column" 进行替换:
txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf
给予
job
1 government
2 poli+tician
3 parliament
现在替换“+”:
txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf
结果是:
job
1 government
2 politician
3 parliament
您需要转义加号,“+”在正则表达式中有特殊含义(它是量词),因此不能被视为标点符号,来自文档:?regex
"+" The preceding item will be matched one or more times.
要匹配这些特殊字符,您需要对它们进行转义,以便可以从字面上理解它们的含义,因此不会翻译它们的特殊含义。在 R 中,您需要两个反斜杠 (\) 来转义。所以在你的情况下,这将是这样的:
gsub("\+","",df$job)
上面的 运行 将通过从数据中删除所有加号来为您提供所需的结果。
所以假设你的 df 是:
df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))
那么你的输出将是:
> gsub("\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"