正则表达式 - 从公式中删除 poly()

Question

我在 R 中有一个公式作为字符向量，如果存在，我需要从该公式中删除 poly()。

示例，以及到目前为止我的一些（未成功）尝试：

p <- "(.*)poly\((\w.*)(.*)(\))(.*)"
unique(sub(p, "\1", "mined + poly(cover, 3) + spp"))
#> [1] "mined + "
unique(sub(p, "\2", "mined + poly(cover, 3) + spp"))
#> [1] "cover, 3"
unique(sub(p, "\3", "mined + poly(cover, 3) + spp"))
#> [1] ""
unique(sub(p, "\4", "mined + poly(cover, 3) + spp"))
#> [1] ")"
unique(sub(p, "\5", "mined + poly(cover, 3) + spp"))
#> [1] " + spp"

我想要的结果：

输入："mined + poly(cover, 3) + spp"

输出："mined + cover + spp"

我尝试了很多模式，但要么 poly( ..., 3) 没有被删除，要么 , 3) 或 , 3 保留在结果字符串中...感谢您的帮助！（顺便说一句，3 是任意的，模式应该删除任何度值...）

Answer 1

试试这个正则表达式：

poly\(([^,]*)[^)]*\)

用第 1 组内容替换匹配项

Click for Demo

解释：

poly\( - 匹配 poly(
([^,]*) - 匹配出现次数超过 0 次的任何非 , 的字符。这是在组 1
[^)]*\) - 匹配出现次数超过 0 次的任何非 ) 后跟 )

现在用第 1 组内容替换整个比赛

Answer 2

gsub("poly\((.+),\s*\d+\)", "\1", inp)
# [1] "mined + cover + spp"

或者以更易于处理的逐步方式（因为您正在努力处理更复杂的正则表达式）：

library(magrittr)
gsub("[^a-zA-Z]", " ", inp) %>% # Drop everything that is not a letter, add space instead
  gsub("poly", "", .) %>%       # Drop the word poly 
  gsub("\s+", " + ", .)        # Add '+' back in. '\s+' stands for one or more spaces
# [1] "mined + cover + spp"

正则表达式 - 从公式中删除 poly()

Regular expression - remove poly() from formula

regex

r

formula