带子组的组的编号反向引用

Numbered back references for groups with subgroups

我有 'fan(s)' 这个词,我想用下面看到的代词动词组合前面的词 fanatic(s) 替换。

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\b[Ff]an)(s?\b)", 
    '\1\2atic\3', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] "He's the bigest He'saticHe's I know."

我知道带编号的反向引用指的是第一组的内括号。有没有办法让它们只引用外部三个括号,其中三个组是:伪代码中的(stuff before fan)(fan)(s\b)

我知道我的正则表达式可以替换 wll 组 si 我知道它是有效的。这只是反向引用部分。

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\b[Ff]an)(s?\b)", 
    '', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] " I know."

期望的输出:

## [1] "He's the bigest fanatic I know."

匹配示例

inputs <- c(
    "He's the bigest fan I know.",
    "I am a huge fan of his.",
    "I know she has lots of fans in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)


outputs <- c(
    "He's the bigest fanatic I know.",
    "I am a huge fanatic of his.",
    "I know she has lots of fanatics in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)

我了解到您对过多的捕获组感到困扰。把你不感兴趣的变成 non-capturing 个,或者去掉那些完全多余的:

((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\b(Fan)(s?)\b

regex demo

请注意,[Ff] 可以变成 Ff,因为您使用的是 ignore.case=TRUE 参数。

R demo:

gsub(
    "((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\b(fan)(s?)\b", 
    '\1\2atic\3', 
    inputs, 
    perl = TRUE, ignore.case = TRUE
)

输出:

[1] "He's the bigest fanatic I know."                     
[2] "I am a huge fanatic of his."                         
[3] "I know she has lots of fans in his club"             
[4] "I was cold and turned on the fan"                    
[5] "An air conditioner is better than 2 fans at cooling."