使用 R 在 Gsub 中包含 /maintaining Space
Including /maintaining Space in Gsub with R
这可能是一个基本问题,可能已经得到解答,但找不到任何直接的 post 不过:
目标:将 text/character 向量中的特定单词替换为“”或“1:n”,最好使用 gsub,也可以接受其他想法。
详情:
我 运行 gsub 的一个循环,它起到了作用,但它每隔 "i" 就替换了一次,当我只想要“ i ”(前后 space 被替换,保留所有其他我在字里行间。所以对于 "to" - 必须只替换完整的字
=> "i went to town indigo" 必须变成 "went town indigo",
当前代码(原则上)使它成为 "went own nd go"
任何输入将不胜感激,谢谢!
这是我在 R 中编写的循环(同样,可能已过时/效率低下):
a1 <- NULL
for(j in 1:length(xt1)) {
for (i in 1:length(st1)) {
xt1[j] = gsub(st1[i], " ", xt1[j])
}
a1[j] = gsub(st1[i], " ", xt1[j])
}
head(a1)
st1
st1
[1] "u" "e" "to" "the" "a" "and" "you" "for" "of" "i"
xt1
xt1
[1] "Nice to see Sofia Kenin get title No.5 in Lyon, especially so soon after her #AusOpen triumph. Always shows up for the tour events and battled hard, through four consecutive three-setters, to earn the title."
[2] "@KeepUKtogether @waltersboy_ Positive destinations for those leaving school are increasingly good. Many go into excellent apprenticeships such as those I met on Friday. Able to study and earn money at the same time. Win win."
[3] "@kamaalrkhan @iTIGERSHROFF Tiger has earn that much stardom to make film hit"
[4] "@ComfortablySmug Women only earn 23 hours to a man's 24. We need to end the hour gap in this country."
[5] "@ByMikeBaker @GlenBikes I think @MayorJenny wants to solve a bigger crisis so she looks good for elections! Or maybe theirs just too many donors counting the money they'll earn or lose based on the decision!"
[6] "Ravens dump Mustangs, earn another national final appearance"
[7] "@JaredRBLX Yeah this is my problem.. in my new build my staircase alone was over 10,000 it takes so long to earn Then I lose motivation or ideas on what I'm doing.. So now saving before anymore building "
[8] "All i do is yearn a life without a concern, and dream of having a turn to earn money to burn mapping out my strategies to get rich. My desire is like a scratch that needs to get itched..."
a1
head(a1)
[1] "N c s S K n n g t t tl No.5 n Lyon, sp c lly so soon ft r h r #A sOp n tr mph. Alw ys shows p th r v nts nd b ttl d h rd, thro gh fo r cons c t v thr -s tt rs, rn th t tl ."
[2] "@K pUK g th r @w lt rsboy_ Pos t v d st n t ons thos l v ng school r ncr s ngly good. M ny go n xc ll nt ppr nt c sh ps s ch s thos I m t on Fr d y. Abl st dy nd rn mon y t th s m t m . W n w n."
[3] "@k m lrkh n @ TIGERSHROFF T g r h s rn th t m ch st rdom m k f lm h t"
[4] "@Com t blySm g Wom n only rn 23 ho rs m n's 24. W n d nd th ho r g p n th s co ntry."
[5] "@ByM k B k r @Gl nB k s I th nk @M yorJ nny w nts solv b gg r cr s s so sh looks good l ct ons! Or m yb th rs j st o m ny donors co nt ng th mon y th y'll rn or los b s d on th d c s on!"
[6] "R v ns d mp M st ngs, rn noth r n t on l f n l pp r nc "
在正则表达式中,\b
是一个"word boundary",基本上是从字母到非字母的变化(可以是space,换行,句末标点符号, ETC。)。因此,您需要在模式的两侧设置单词边界。
我们还可以做得比 for
循环更好---用 |
分隔每个模式或,您可以一次进行所有替换。
st1 = c("u", "e", "to", "the", "a", "and", "you", "for", "of", "i")
st1b = paste0("\b", st1, "\b", collapse = "|") ## adding extra \ for R
gsub(st1b, "", "i went to town indigo")
# [1] " went town indigo"
这留下了额外的 spaces,你可以用另一个 gsub(" +", "", x)
来清理它们
根据 Gregor 的回答,您不需要循环来进行替换。您可以使用 stringr 包中的 gsub 或 string_replace:
library(stringr)
st1 = c("u", "e", "to", "the", "a", "and", "you", "for", "of", "i")
st1b = paste0("\b", st1, "\b", collapse = "|") ## adding extra \ for R
xt1 <- c("Nice to see Sofia Kenin get title No.5 in Lyon, especially so soon after her #AusOpen triumph. Always shows up for the tour events and battled hard, through four consecutive three-setters, to earn the title.",
"@KeepUKtogether @waltersboy_ Positive destinations for those leaving school are increasingly good. Many go into excellent apprenticeships such as those I met on Friday. Able to study and earn money at the same time. Win win.",
"@kamaalrkhan @iTIGERSHROFF Tiger has earn that much stardom to make film hit",
"@ComfortablySmug Women only earn 23 hours to a man's 24. We need to end the hour gap in this country.",
"@ByMikeBaker @GlenBikes I think @MayorJenny wants to solve a bigger crisis so she looks good for elections! Or maybe theirs just too many donors counting the money they'll earn or lose based on the decision!",
"Ravens dump Mustangs, earn another national final appearance",
"@JaredRBLX Yeah this is my problem.. in my new build my staircase alone was over 10,000 it takes so long to earn Then I lose motivation or ideas on what I'm doing.. So now saving before anymore building ",
"All i do is yearn a life without a concern, and dream of having a turn to earn money to burn mapping out my strategies to get rich. My desire is like a scratch that needs to get itched...")
a1 <- str_replace_all(xt1, st1b, "")
a1 <- gsub(st1b, "", xt1)
这可能是一个基本问题,可能已经得到解答,但找不到任何直接的 post 不过:
目标:将 text/character 向量中的特定单词替换为“”或“1:n”,最好使用 gsub,也可以接受其他想法。
详情: 我 运行 gsub 的一个循环,它起到了作用,但它每隔 "i" 就替换了一次,当我只想要“ i ”(前后 space 被替换,保留所有其他我在字里行间。所以对于 "to" - 必须只替换完整的字 => "i went to town indigo" 必须变成 "went town indigo", 当前代码(原则上)使它成为 "went own nd go" 任何输入将不胜感激,谢谢!
这是我在 R 中编写的循环(同样,可能已过时/效率低下):
a1 <- NULL
for(j in 1:length(xt1)) {
for (i in 1:length(st1)) {
xt1[j] = gsub(st1[i], " ", xt1[j])
}
a1[j] = gsub(st1[i], " ", xt1[j])
}
head(a1)
st1
st1 [1] "u" "e" "to" "the" "a" "and" "you" "for" "of" "i"
xt1
xt1 [1] "Nice to see Sofia Kenin get title No.5 in Lyon, especially so soon after her #AusOpen triumph. Always shows up for the tour events and battled hard, through four consecutive three-setters, to earn the title."
[2] "@KeepUKtogether @waltersboy_ Positive destinations for those leaving school are increasingly good. Many go into excellent apprenticeships such as those I met on Friday. Able to study and earn money at the same time. Win win."
[3] "@kamaalrkhan @iTIGERSHROFF Tiger has earn that much stardom to make film hit"
[4] "@ComfortablySmug Women only earn 23 hours to a man's 24. We need to end the hour gap in this country."
[5] "@ByMikeBaker @GlenBikes I think @MayorJenny wants to solve a bigger crisis so she looks good for elections! Or maybe theirs just too many donors counting the money they'll earn or lose based on the decision!"
[6] "Ravens dump Mustangs, earn another national final appearance"
[7] "@JaredRBLX Yeah this is my problem.. in my new build my staircase alone was over 10,000 it takes so long to earn Then I lose motivation or ideas on what I'm doing.. So now saving before anymore building " [8] "All i do is yearn a life without a concern, and dream of having a turn to earn money to burn mapping out my strategies to get rich. My desire is like a scratch that needs to get itched..."
a1
head(a1) [1] "N c s S K n n g t t tl No.5 n Lyon, sp c lly so soon ft r h r #A sOp n tr mph. Alw ys shows p th r v nts nd b ttl d h rd, thro gh fo r cons c t v thr -s tt rs, rn th t tl ."
[2] "@K pUK g th r @w lt rsboy_ Pos t v d st n t ons thos l v ng school r ncr s ngly good. M ny go n xc ll nt ppr nt c sh ps s ch s thos I m t on Fr d y. Abl st dy nd rn mon y t th s m t m . W n w n." [3] "@k m lrkh n @ TIGERSHROFF T g r h s rn th t m ch st rdom m k f lm h t"
[4] "@Com t blySm g Wom n only rn 23 ho rs m n's 24. W n d nd th ho r g p n th s co ntry."
[5] "@ByM k B k r @Gl nB k s I th nk @M yorJ nny w nts solv b gg r cr s s so sh looks good l ct ons! Or m yb th rs j st o m ny donors co nt ng th mon y th y'll rn or los b s d on th d c s on!"
[6] "R v ns d mp M st ngs, rn noth r n t on l f n l pp r nc "
在正则表达式中,\b
是一个"word boundary",基本上是从字母到非字母的变化(可以是space,换行,句末标点符号, ETC。)。因此,您需要在模式的两侧设置单词边界。
我们还可以做得比 for
循环更好---用 |
分隔每个模式或,您可以一次进行所有替换。
st1 = c("u", "e", "to", "the", "a", "and", "you", "for", "of", "i")
st1b = paste0("\b", st1, "\b", collapse = "|") ## adding extra \ for R
gsub(st1b, "", "i went to town indigo")
# [1] " went town indigo"
这留下了额外的 spaces,你可以用另一个 gsub(" +", "", x)
根据 Gregor 的回答,您不需要循环来进行替换。您可以使用 stringr 包中的 gsub 或 string_replace:
library(stringr)
st1 = c("u", "e", "to", "the", "a", "and", "you", "for", "of", "i")
st1b = paste0("\b", st1, "\b", collapse = "|") ## adding extra \ for R
xt1 <- c("Nice to see Sofia Kenin get title No.5 in Lyon, especially so soon after her #AusOpen triumph. Always shows up for the tour events and battled hard, through four consecutive three-setters, to earn the title.",
"@KeepUKtogether @waltersboy_ Positive destinations for those leaving school are increasingly good. Many go into excellent apprenticeships such as those I met on Friday. Able to study and earn money at the same time. Win win.",
"@kamaalrkhan @iTIGERSHROFF Tiger has earn that much stardom to make film hit",
"@ComfortablySmug Women only earn 23 hours to a man's 24. We need to end the hour gap in this country.",
"@ByMikeBaker @GlenBikes I think @MayorJenny wants to solve a bigger crisis so she looks good for elections! Or maybe theirs just too many donors counting the money they'll earn or lose based on the decision!",
"Ravens dump Mustangs, earn another national final appearance",
"@JaredRBLX Yeah this is my problem.. in my new build my staircase alone was over 10,000 it takes so long to earn Then I lose motivation or ideas on what I'm doing.. So now saving before anymore building ",
"All i do is yearn a life without a concern, and dream of having a turn to earn money to burn mapping out my strategies to get rich. My desire is like a scratch that needs to get itched...")
a1 <- str_replace_all(xt1, st1b, "")
a1 <- gsub(st1b, "", xt1)