如何在包含多个具有不同含义的相同单词的行上执行正则表达式？

Question

我有一句话。我的爸爸、爷爷和曾曾祖父长得很像。如何使用 grep 创建一个正则表达式来获取爸爸、爷爷、伟大的伟大数据价值。

我试过使用 str_extract_all(pattern = "(great)?\s(grand)?(father|mother)", sentence) 但收效甚微。

Answer 1

以下正则表达式应该有效：

\b(?:(?:great )*granddad|dad)\b

R代码：

sentence <- "My dad, granddad and great great granddad looks alike."
str_extract_all(pattern = "\b(?:(?:great )*granddad|dad)\b", sentence)[[1]]

[1] "dad"                  "granddad"             "great great granddad"

Demo

这里的技巧是使用交替，就像您已经在使用的那样，但是首先放置更多个特定术语。模式 (?:great )*granddad 将首先匹配 great great granddad，然后是 great granddad（实际上并没有出现在您的句子中），最后是 granddad.

如何在包含多个具有不同含义的相同单词的行上执行正则表达式？

How to you perform the regex on the line with multiple same word with distinct meaning?

r

pattern-matching

Demo