如何使用正则表达式查找 curly-braces 中的所有 curly-braces?
How to use a regex to find all curly-braces inside curly-braces?
我正在使用 Zotero 从 PDF 创建一个 BibTeX 参考列表,它使用 { } 来包围必须保留大小写的单词。
title = {Novel breeding habitat, oviposition microhabitat, and
parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in
southeastern {Brazil}},
然而,我的团队中有些人使用 Mendeley,它似乎不知道 BibTeX 格式的这一规则,并且在从我发送的 BibTeX 文件导入后,{ } 仍然出现在他们的标题中。
所以我想写一个小脚本(在 R 中)以删除标题(和其他字段)的主要 {} 内的 {},以便上面的行在修改后的文件中变为下面。
title = {Novel breeding habitat, oviposition microhabitat, and
parental care in Bokermannohyla caramaschii (Anura: Hylidae) in
southeastern Brazil},
我已经尝试了很多,但没有任何效果。执行此操作的正则表达式是什么?
如果我们可以确定“%%%”和“###”字符串不会出现在标题中,那么这是一个有效的策略。首先我们把第一个“{”改成“%%%”,最后一个“}”改成“###”。然后把“{”和“}”全部去掉,然后把第一个“{”和最后一个“}”放回去。
txt <- "title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},"
txt2 <- sub("(^[^{]+)(\{)", "\1%%%", txt) # placeholder for first "{"
txt3 <- sub("(\})([^}]*$)", "###\2", txt2) # " " for last "}"
txt4 <- gsub("\{|\}", "", txt3) # remove the rest
txt5 <- sub("%%%", "{", tx4) # put the leading and trailing ones back
txt6 <- sub("###", "}", txt5)
txt6
[1] "title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},"
这是一个仅删除 {
和 }
的解析器,并且仅当位于 { ... }
的完整集合中时。它并不假装快速或高效,但使用 reasonable-length 个字符串,您应该不会注意到任何延迟。
func <- function(S) {
spl <- strsplit(S, "")[[1]]
out <- character(0)
inbrace <- 0L
for (i in seq_along(spl)) {
ch <- spl[i]
if (ch == "{") {
if (inbrace < 1L) out <- c(out, ch)
inbrace <- inbrace + 1L
} else if (ch == "}") {
if (inbrace == 0L) {
stop("unmatched close brace at: ", i)
} else if (inbrace == 1L) {
out <- c(out, ch)
}
inbrace <- max(0L, inbrace - 1L)
} else out <- c(out, ch)
}
if (inbrace != 0L) stop("finished missing ", inbrace, " close-brace(s)")
paste(out, collapse = "")
}
演示:
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},')
# [1] "title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},"
它试图非常具体,如果出现不匹配的 }
或输入结束而 {
仍然不匹配则失败。
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil},')
# Error in func("title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil},") :
# finished missing 1 close-brace(s)
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla}} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},')
# Error in func("title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla}} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},") :
# unmatched close brace at: 156
您可以转换正则表达式的匹配项
(?<!^title = ){|}(?!,$)
到空字符串(perl=TRUE
)。
正则表达式可以分解如下。 (我将 spaces 显示为包含 space 的字符 类,以便 reader 可以看到它们。)
(?<! # begin a negative lookbehind
^ # match the start of the string
title[ ]=[ ] # match 'title = '
) # end negative lookbehind
{ # match '{'
| # or
} # match '}'
(?! # begin a negative lookahead
,$ # match a comma at the end of the string
) # end a negative lookahead
我正在使用 Zotero 从 PDF 创建一个 BibTeX 参考列表,它使用 { } 来包围必须保留大小写的单词。
title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},
然而,我的团队中有些人使用 Mendeley,它似乎不知道 BibTeX 格式的这一规则,并且在从我发送的 BibTeX 文件导入后,{ } 仍然出现在他们的标题中。
所以我想写一个小脚本(在 R 中)以删除标题(和其他字段)的主要 {} 内的 {},以便上面的行在修改后的文件中变为下面。
title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},
我已经尝试了很多,但没有任何效果。执行此操作的正则表达式是什么?
如果我们可以确定“%%%”和“###”字符串不会出现在标题中,那么这是一个有效的策略。首先我们把第一个“{”改成“%%%”,最后一个“}”改成“###”。然后把“{”和“}”全部去掉,然后把第一个“{”和最后一个“}”放回去。
txt <- "title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},"
txt2 <- sub("(^[^{]+)(\{)", "\1%%%", txt) # placeholder for first "{"
txt3 <- sub("(\})([^}]*$)", "###\2", txt2) # " " for last "}"
txt4 <- gsub("\{|\}", "", txt3) # remove the rest
txt5 <- sub("%%%", "{", tx4) # put the leading and trailing ones back
txt6 <- sub("###", "}", txt5)
txt6
[1] "title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},"
这是一个仅删除 {
和 }
的解析器,并且仅当位于 { ... }
的完整集合中时。它并不假装快速或高效,但使用 reasonable-length 个字符串,您应该不会注意到任何延迟。
func <- function(S) {
spl <- strsplit(S, "")[[1]]
out <- character(0)
inbrace <- 0L
for (i in seq_along(spl)) {
ch <- spl[i]
if (ch == "{") {
if (inbrace < 1L) out <- c(out, ch)
inbrace <- inbrace + 1L
} else if (ch == "}") {
if (inbrace == 0L) {
stop("unmatched close brace at: ", i)
} else if (inbrace == 1L) {
out <- c(out, ch)
}
inbrace <- max(0L, inbrace - 1L)
} else out <- c(out, ch)
}
if (inbrace != 0L) stop("finished missing ", inbrace, " close-brace(s)")
paste(out, collapse = "")
}
演示:
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},')
# [1] "title = {Novel breeding habitat, oviposition microhabitat, and parental care in Bokermannohyla caramaschii (Anura: Hylidae) in southeastern Brazil},"
它试图非常具体,如果出现不匹配的 }
或输入结束而 {
仍然不匹配则失败。
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil},')
# Error in func("title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil},") :
# finished missing 1 close-brace(s)
func('title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla}} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},')
# Error in func("title = {Novel breeding habitat, oviposition microhabitat, and parental care in {Bokermannohyla}} caramaschii ({Anura}: {Hylidae}) in southeastern {Brazil}},") :
# unmatched close brace at: 156
您可以转换正则表达式的匹配项
(?<!^title = ){|}(?!,$)
到空字符串(perl=TRUE
)。
正则表达式可以分解如下。 (我将 spaces 显示为包含 space 的字符 类,以便 reader 可以看到它们。)
(?<! # begin a negative lookbehind
^ # match the start of the string
title[ ]=[ ] # match 'title = '
) # end negative lookbehind
{ # match '{'
| # or
} # match '}'
(?! # begin a negative lookahead
,$ # match a comma at the end of the string
) # end a negative lookahead