删除括号内的非数字字符
Remove non-numeric characters within parantheses
我想删除特定括号内的非数字字符,并删除该行中的 other 括号。看下面的例子;
text <- c("1110383 Project something 11/22/2019 (WSO) (89021-design)
John Doe (John.Doe@company22.com)",
"1110383 Project something 11/22/2019 ASP (890212-wso)
John Doe (John.Doe@company22.com)
Other Stuff",
"1110383 Project something SD (890212)
John Doe (John.Doe@company22.com)")
预期输出为:
cat(paste0(myoutxt, collapse = "\n"))
# 1110383 Project something 11/22/2019 WSO (89021)
# John Doe (John.Doe@company22.com)
# 1110383 Project something 11/22/2019 ASP (890212)
# John Doe (John.Doe@company22.com)
# 1110383 Project something SD (890212)
# John Doe (John.Doe@company22.com)
我想出了一个与我的 5 位或 6 位数字相匹配的正则表达式,但我不确定应该替换什么。另外我认为应该修改以下内容,因为它不考虑可能存在的其他括号来删除它们。
^.*?\([^\d]*(\d{5,6})[^\d]*\).*$
逻辑:
基本上,我希望找到括号之间带有 5-6 位数字(例如 89021
或 890212
)的行。然后,如果括号内还有其他内容,我想删除它们(例如 -design
或 -wso
)。最后,如果该特定行中还有其他括号(例如 (WSO)
),我希望删除括号而不是单词。
如何替换
(?:\(([^)\d]+)\)(.*?))?\([^\d)]*(\d{5,6})[^\d)]*\)
至
()
(?:\(([^)\d]+)\)(.*?))?
第一个 optional part captures </code> 之前任何带括号的内容。在括号中的 5-6 位数字部分被捕获到 <code>
之前可能跟在后面的任何内容
\([^\d)]*(\d{5,6})[^\d)]*\)
第二部分截取5-6位数字到</code></li>
</ul>
<p><a href="https://regex101.com/r/C1Dflp/4" rel="nofollow noreferrer">See the demo at regex101</a></p>
<hr>
<p>在 <a href="/questions/tagged/r" class="post-tag" title="show questions tagged 'r'" rel="tag">r</a> 使用 <code>gsub
:
gsub(pattern='(?:\(([^)\d]+)\)(.*?))?\([^\d)(]*(\d{5,6})[^\d)(]*\)',
replacement='\1\2(\3)',
x=text,
perl=TRUE, fixed = FALSE)
这是你想要的吗?
"\(([^0-9@]*)\)"
:删除任何不包含数字或 @
的括号
"\((\d{5,6}).*\)"
:对于包含 5 到 6 个数字 + 其他任何内容的括号,只保留数字。
我假设另一组括号总是包含电子邮件地址。
library(stringr)
cat(
paste0(
str_replace(
str_replace(text, "\(([^0-9@]*)\)", "\1"),
"\((\d{5,6}).*\)",
"\1"),
collapse = "\n"
)
)
# 1110383 Project something 11/22/2019 WSO (89021)
# John Doe (John.Doe@company22.com)
# 1110383 Project something 11/22/2019 ASP (890212)
# John Doe (John.Doe@company22.com)
# Other Stuff
# 1110383 Project something SD (890212)
# John Doe (John.Doe@company22.com)
这是横向方法
fun_0 <- function(string) {
vec <- strsplit(string, '\(|\)', perl = TRUE)[[1L]]
s <- ifelse(startsWith(string, '('), 1L, 2L)
e <- length(vec)
if (s > e)
return(vec)
inside_brackets <- seq(s, e, 2L)
vec[inside_brackets] <- gsub('\D*(\d{4,5})\D*', '(\1)', vec[inside_brackets])
paste(vec, collapse = '')
}
fun_1 <- function(string_vec) {
to_process <- grepl('\d{4,}', string_vec)
string_vec[to_process] <- vapply(string_vec[to_process], fun_0, character(1))
paste(string_vec, collapse = '\n')
}
fun_2 <- function(text) {
string_list <- strsplit(text, '\n')
vapply(string_list, fun_1, character(1))
}
例子
text <- c("1110383 Project something 11/22/2019 (WSO) (89021-design)\nJohn Doe (John.Doe@company22.com)",
"1110383 Project something 11/22/2019 ASP (890212-wso)\nJohn Doe (John.Doe@company22.com)\nOther Stuff",
"1110383 Project something SD (890212)\nJohn Doe (John.Doe@company22.com)")
fun_2(text)
# [1] "1110383 Project something 11/22/2019 WSO (89021)\nJohn Doe (John.Doe@company22.com)"
# [2] "1110383 Project something 11/22/2019 ASP (89021)2-wso\nJohn Doe (John.Doe@company22.com)\nOther Stuff"
# [3] "1110383 Project something SD (89021)2\nJohn Doe (John.Doe@company22.com)"
我想删除特定括号内的非数字字符,并删除该行中的 other 括号。看下面的例子;
text <- c("1110383 Project something 11/22/2019 (WSO) (89021-design)
John Doe (John.Doe@company22.com)",
"1110383 Project something 11/22/2019 ASP (890212-wso)
John Doe (John.Doe@company22.com)
Other Stuff",
"1110383 Project something SD (890212)
John Doe (John.Doe@company22.com)")
预期输出为:
cat(paste0(myoutxt, collapse = "\n"))
# 1110383 Project something 11/22/2019 WSO (89021)
# John Doe (John.Doe@company22.com)
# 1110383 Project something 11/22/2019 ASP (890212)
# John Doe (John.Doe@company22.com)
# 1110383 Project something SD (890212)
# John Doe (John.Doe@company22.com)
我想出了一个与我的 5 位或 6 位数字相匹配的正则表达式,但我不确定应该替换什么。另外我认为应该修改以下内容,因为它不考虑可能存在的其他括号来删除它们。
^.*?\([^\d]*(\d{5,6})[^\d]*\).*$
逻辑:
基本上,我希望找到括号之间带有 5-6 位数字(例如 89021
或 890212
)的行。然后,如果括号内还有其他内容,我想删除它们(例如 -design
或 -wso
)。最后,如果该特定行中还有其他括号(例如 (WSO)
),我希望删除括号而不是单词。
如何替换
(?:\(([^)\d]+)\)(.*?))?\([^\d)]*(\d{5,6})[^\d)]*\)
至
()
(?:\(([^)\d]+)\)(.*?))?
第一个 optional part captures</code> 之前任何带括号的内容。在括号中的 5-6 位数字部分被捕获到 <code>
之前可能跟在后面的任何内容
\([^\d)]*(\d{5,6})[^\d)]*\)
第二部分截取5-6位数字到</code></li> </ul> <p><a href="https://regex101.com/r/C1Dflp/4" rel="nofollow noreferrer">See the demo at regex101</a></p> <hr> <p>在 <a href="/questions/tagged/r" class="post-tag" title="show questions tagged 'r'" rel="tag">r</a> 使用 <code>gsub
:gsub(pattern='(?:\(([^)\d]+)\)(.*?))?\([^\d)(]*(\d{5,6})[^\d)(]*\)', replacement='\1\2(\3)', x=text, perl=TRUE, fixed = FALSE)
这是你想要的吗?
"\(([^0-9@]*)\)"
:删除任何不包含数字或@
的括号
"\((\d{5,6}).*\)"
:对于包含 5 到 6 个数字 + 其他任何内容的括号,只保留数字。
我假设另一组括号总是包含电子邮件地址。
library(stringr)
cat(
paste0(
str_replace(
str_replace(text, "\(([^0-9@]*)\)", "\1"),
"\((\d{5,6}).*\)",
"\1"),
collapse = "\n"
)
)
# 1110383 Project something 11/22/2019 WSO (89021)
# John Doe (John.Doe@company22.com)
# 1110383 Project something 11/22/2019 ASP (890212)
# John Doe (John.Doe@company22.com)
# Other Stuff
# 1110383 Project something SD (890212)
# John Doe (John.Doe@company22.com)
这是横向方法
fun_0 <- function(string) {
vec <- strsplit(string, '\(|\)', perl = TRUE)[[1L]]
s <- ifelse(startsWith(string, '('), 1L, 2L)
e <- length(vec)
if (s > e)
return(vec)
inside_brackets <- seq(s, e, 2L)
vec[inside_brackets] <- gsub('\D*(\d{4,5})\D*', '(\1)', vec[inside_brackets])
paste(vec, collapse = '')
}
fun_1 <- function(string_vec) {
to_process <- grepl('\d{4,}', string_vec)
string_vec[to_process] <- vapply(string_vec[to_process], fun_0, character(1))
paste(string_vec, collapse = '\n')
}
fun_2 <- function(text) {
string_list <- strsplit(text, '\n')
vapply(string_list, fun_1, character(1))
}
例子
text <- c("1110383 Project something 11/22/2019 (WSO) (89021-design)\nJohn Doe (John.Doe@company22.com)",
"1110383 Project something 11/22/2019 ASP (890212-wso)\nJohn Doe (John.Doe@company22.com)\nOther Stuff",
"1110383 Project something SD (890212)\nJohn Doe (John.Doe@company22.com)")
fun_2(text)
# [1] "1110383 Project something 11/22/2019 WSO (89021)\nJohn Doe (John.Doe@company22.com)"
# [2] "1110383 Project something 11/22/2019 ASP (89021)2-wso\nJohn Doe (John.Doe@company22.com)\nOther Stuff"
# [3] "1110383 Project something SD (89021)2\nJohn Doe (John.Doe@company22.com)"