如何根据 R 中的条件拆分字符串?

How to Split Strings based on conditions in R?

我想通过查看单词 'split here' 将单个字符串拆分为多个字符串,仅当它存在于 '>' 和 '<' 之间并且不删除除单词 [= 之外的任何其他字符21=]

text <- c("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >")

预期输出:

text[1] = "Don't split here > yes here "
text[2] = "and blah blah < again don't (anything could be here) split here >"

我试过了

gsub(">(.*split here.*)<","", text)

但这似乎不起作用。有人可以使用正则表达式 exp 吗?帮帮我?

用\1替换需要的字符串,然后在\1上拆分:

strsplit(gsub("(>[^<]+) split here ([^<]+<)", "\1\2", text), "")
## [[1]]
## [1] "Don't split here > yes here"             
## [2] "and blah blah < again don't split here >"

如果输入是字符向量,输出将是一个列表,或者如果您想展平它,只需使用 unlist(s),其中 s 是上述代码行的结果。

您可以使用简单的 strsplit 使用此正则表达式,利用 \K(使用 perl=TRUE)运算符为您提供所需的字符串。

>[^>]*?\Ksplit here\s*(?=[^<]*<)

Regex Demo

R Code demo

strsplit("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >", ">[^>]*?\Ksplit here\s*(?=[^<]*<)", perl=TRUE)

打印,

[[1]]
[1] "Don't split here > yes here "                                     
[2] "and blah blah < again don't (anything could be here) split here >"

你也可以这样做-

 > str_split(gsub(str_extract(text,"(?<=>).*?(?=\<)"),gsub("split here","nsplit here",str_extract(text,"(?<=>).*?(?=\<)")),text),"nsplit here")

输出-

[[1]]
[1] "Don't split here > yes here "                                      
    " and blah blah < again don't (anything could be here) split here >"