使用正则表达式来词干
Stem the word by using regex
我正在尝试使用正则表达式来阻止文本中的单词。
c <- "Foo is down. No one wants Foos after this. Before, people liked Fooy a lot."
期望的输出:
"Foo is down. No one wants Foo after this. Before, people liked Foo a lot."
我需要保留单词 Foo
,但删除该单词后面的所有字符,保留字符串的其余部分。
我设法从单词的基部拆分后缀,我可以删除单词变体后的所有内容 "Foo",并且我尝试了单词边界,但无法弄清楚如何获得所需的输出.
解决此问题的一个可能的正则表达式将 "Foo with one or more letters after it" 替换为 "Foo":
> x = "Foo is down. No one wants Foos after this. Before, people liked Fooy a lot."
> stringr::str_replace_all(x, "Foo[a-z]+", "Foo")
[1] "Foo is down. No one wants Foo after this. Before, people liked Foo a lot."
我们可以尝试使用 gsub
并将模式 (?<=Foo)\S+
替换为空字符串:
x <- "Foo is down. No one wants Foos after this. Before, people liked Fooy a lot."
output <- gsub("(?<=Foo)\S+", "", x, perl=TRUE)
output
[1] "Foo is down. No one wants Foo after this. Before, people liked Foo a lot."
我正在尝试使用正则表达式来阻止文本中的单词。
c <- "Foo is down. No one wants Foos after this. Before, people liked Fooy a lot."
期望的输出:
"Foo is down. No one wants Foo after this. Before, people liked Foo a lot."
我需要保留单词 Foo
,但删除该单词后面的所有字符,保留字符串的其余部分。
我设法从单词的基部拆分后缀,我可以删除单词变体后的所有内容 "Foo",并且我尝试了单词边界,但无法弄清楚如何获得所需的输出.
解决此问题的一个可能的正则表达式将 "Foo with one or more letters after it" 替换为 "Foo":
> x = "Foo is down. No one wants Foos after this. Before, people liked Fooy a lot."
> stringr::str_replace_all(x, "Foo[a-z]+", "Foo")
[1] "Foo is down. No one wants Foo after this. Before, people liked Foo a lot."
我们可以尝试使用 gsub
并将模式 (?<=Foo)\S+
替换为空字符串:
x <- "Foo is down. No one wants Foos after this. Before, people liked Fooy a lot."
output <- gsub("(?<=Foo)\S+", "", x, perl=TRUE)
output
[1] "Foo is down. No one wants Foo after this. Before, people liked Foo a lot."