正则表达式：提取两个 characters/strings 之间的字符串

Question

我有一个模型公式（作为字符串）并且想提取特定参数的值，在我的例子中是 id。现在我找到了一种方法 return 字符串 而没有 所需的字符串值。我想要的恰恰相反，我只想要结果中缺少的字符串值：

xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
sub("(?=(id=|id =))([a-zA-Z].*)(?=,)", "\1", xx, perl =T)
#> [1] "gee(formula = breaks ~ tension, id =, data = warpbreaks)"

wool 在 return 值中缺失，但我只想将 wool 作为结果字符串...谁能帮我找到正确的正则表达式模式？

Answer 1

您可以使用

xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
sub(".*\bid\s*=\s*(\w+).*", "\1", xx)
## or, if the value extracted may contain any chars but commas
sub(".*\bid\s*=\s*([^,]+).*", "\1", xx)

参见R demo and the regex demo。

详情

.* - 任意 0+ 个字符，尽可能多
\bid - 一个完整的单词id（\b是一个单词边界）
\s*=\s* - 包含 0+ 个空格的 =
(\w+) - 捕获组 1（替换模式中的 \1 指的是这个值）：一个或多个字母、数字或下划线（或 [^,]+ 匹配 1+ 个其他字符比逗号)
.* - 字符串的其余部分。

其他替代解决方案：

> xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
> regmatches(xx, regexpr("\bid\s*=\s*\K[^,]+", xx, perl=TRUE))
[1] "wool"

模式匹配 id，= 包含 0+ 个空格，然后 \K 省略匹配的文本，只有 , 以外的 1+ 个字符落入匹配值。

或者，stringr::str_match 的捕获方法在这里也有效：

> library(stringr)
> str_match(xx, "\bid\s*=\s*([^,]+)")[,2]
[1] "wool"

Answer 2

您可以 parse() 字符串并按名称获取 id 参数，而不是此处的正则表达式。

as.character(parse(text = xx)[[1]]$id)
# [1] "wool"

正则表达式：提取两个 characters/strings 之间的字符串

Regular expression: extract string between two characters/strings

regex

r

formula