如何根据 R 中的条件拆分字符串?
How to Split Strings based on conditions in R?
我想通过查看单词 'split here' 将单个字符串拆分为多个字符串,仅当它存在于 '>' 和 '<' 之间并且不删除除单词 [= 之外的任何其他字符21=]
text <- c("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >")
预期输出:
text[1] = "Don't split here > yes here "
text[2] = "and blah blah < again don't (anything could be here) split here >"
我试过了
gsub(">(.*split here.*)<","", text)
但这似乎不起作用。有人可以使用正则表达式 exp 吗?帮帮我?
用\1替换需要的字符串,然后在\1上拆分:
strsplit(gsub("(>[^<]+) split here ([^<]+<)", "\1\2", text), "")
## [[1]]
## [1] "Don't split here > yes here"
## [2] "and blah blah < again don't split here >"
如果输入是字符向量,输出将是一个列表,或者如果您想展平它,只需使用 unlist(s)
,其中 s
是上述代码行的结果。
您可以使用简单的 strsplit
使用此正则表达式,利用 \K
(使用 perl=TRUE)运算符为您提供所需的字符串。
>[^>]*?\Ksplit here\s*(?=[^<]*<)
strsplit("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >", ">[^>]*?\Ksplit here\s*(?=[^<]*<)", perl=TRUE)
打印,
[[1]]
[1] "Don't split here > yes here "
[2] "and blah blah < again don't (anything could be here) split here >"
你也可以这样做-
> str_split(gsub(str_extract(text,"(?<=>).*?(?=\<)"),gsub("split here","nsplit here",str_extract(text,"(?<=>).*?(?=\<)")),text),"nsplit here")
输出-
[[1]]
[1] "Don't split here > yes here "
" and blah blah < again don't (anything could be here) split here >"
我想通过查看单词 'split here' 将单个字符串拆分为多个字符串,仅当它存在于 '>' 和 '<' 之间并且不删除除单词 [= 之外的任何其他字符21=]
text <- c("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >")
预期输出:
text[1] = "Don't split here > yes here "
text[2] = "and blah blah < again don't (anything could be here) split here >"
我试过了
gsub(">(.*split here.*)<","", text)
但这似乎不起作用。有人可以使用正则表达式 exp 吗?帮帮我?
用\1替换需要的字符串,然后在\1上拆分:
strsplit(gsub("(>[^<]+) split here ([^<]+<)", "\1\2", text), "")
## [[1]]
## [1] "Don't split here > yes here"
## [2] "and blah blah < again don't split here >"
如果输入是字符向量,输出将是一个列表,或者如果您想展平它,只需使用 unlist(s)
,其中 s
是上述代码行的结果。
您可以使用简单的 strsplit
使用此正则表达式,利用 \K
(使用 perl=TRUE)运算符为您提供所需的字符串。
>[^>]*?\Ksplit here\s*(?=[^<]*<)
strsplit("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >", ">[^>]*?\Ksplit here\s*(?=[^<]*<)", perl=TRUE)
打印,
[[1]]
[1] "Don't split here > yes here "
[2] "and blah blah < again don't (anything could be here) split here >"
你也可以这样做-
> str_split(gsub(str_extract(text,"(?<=>).*?(?=\<)"),gsub("split here","nsplit here",str_extract(text,"(?<=>).*?(?=\<)")),text),"nsplit here")
输出-
[[1]]
[1] "Don't split here > yes here "
" and blah blah < again don't (anything could be here) split here >"