Grep 在 R 中以 X 开头的整个单词

Question

我需要删掉各种短语中的某些单词，但由于这些单词可能是变位、复数或所有格，所以我只能查找前几个字母。一个例子：

example = "You are the elephant's friend."
gsub("\beleph.*\b", " _____ " , example)
[1] "You are the  _____ "

如何从前几个字母开始匹配整个单词？

Answer 1

gsub("\beleph[[:alpha:][:punct:]]+\b", "_____" , example)
[1] "You are the _____ friend."

在这种情况下有效。

更改是用字符 class“[[:alpha:][:punct:]]+”替换贪婪的（有时是危险的）“.*”匹配任何正则表达式匹配字母字符和标点字符。请参阅 help(regex) 了解可能有用的其他 ready-made 字符 class，例如 [:alnum:] 以防任何字符串也包含数字。

为了捕捉与第一个词的匹配，以下应该有效。这是一个例子。

exampleYoda = "elephant's friend you be."

gsub("(\b|^)eleph[[:alpha:][:punct:]]+\b", "_____" , exampleYoda)
[1] "_____ friend you be."

也适用于示例

gsub("(\b|^)eleph[[:alpha:][:punct:]]+\b", "_____" , example)
[1] "You are the _____ friend."

Answer 2

要使您的原始代码正常工作，您只需使量词不贪心即可。

example = "You are the elephant's friend."
gsub("\beleph.*?\b", " _____ " , example)
[1] "You are the  _____ 's friend."

此解决方案会导致 ' 但您可以使用空格 insead，因此您可以尝试

example = "You are the elephant's friend."
gsub("\seleph.*?\s", " _____ " , example)
[1] "You are the _____ friend."

Grep for whole word that starts with X in R