如何根据 R 中的条件拆分字符串？

Question

我想通过查看单词 'split here' 将单个字符串拆分为多个字符串，仅当它存在于 '>' 和 '<' 之间并且不删除除单词 [= 之外的任何其他字符21=]

text <- c("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >")

预期输出：

text[1] = "Don't split here > yes here "
text[2] = "and blah blah < again don't (anything could be here) split here >"

我试过了

gsub(">(.*split here.*)<","", text)

但这似乎不起作用。有人可以使用正则表达式 exp 吗？帮帮我？

Answer 1

用\1替换需要的字符串，然后在\1上拆分：

strsplit(gsub("(>[^<]+) split here ([^<]+<)", "\1\2", text), "")
## [[1]]
## [1] "Don't split here > yes here"             
## [2] "and blah blah < again don't split here >"

如果输入是字符向量，输出将是一个列表，或者如果您想展平它，只需使用 unlist(s)，其中 s 是上述代码行的结果。

Answer 2

您可以使用简单的 strsplit 使用此正则表达式，利用 \K（使用 perl=TRUE）运算符为您提供所需的字符串。

>[^>]*?\Ksplit here\s*(?=[^<]*<)

Regex Demo

R Code demo

strsplit("Don't split here > yes here split here and blah blah < again don't (anything could be here) split here >", ">[^>]*?\Ksplit here\s*(?=[^<]*<)", perl=TRUE)

打印，

[[1]]
[1] "Don't split here > yes here "                                     
[2] "and blah blah < again don't (anything could be here) split here >"

Answer 3

你也可以这样做-

 > str_split(gsub(str_extract(text,"(?<=>).*?(?=\<)"),gsub("split here","nsplit here",str_extract(text,"(?<=>).*?(?=\<)")),text),"nsplit here")

输出-

[[1]]
[1] "Don't split here > yes here "                                      
    " and blah blah < again don't (anything could be here) split here >"

如何根据 R 中的条件拆分字符串？

How to Split Strings based on conditions in R?

regex

r

gsub