如何仅在有多个值时使用 sub on

Question

所以这是数据框的一个简短示例：

x<- c("WB (16)","CT (14)WB (15)","ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)")

我的问题是如何让 sub(".*?)", "", x)（或不同的函数）工作，从而得到以下结果：

x<-c("WB (16)","WB (15)","TN-SE(17)")

而不是

x<-c("","WB (15)")

我得到了不同类型的字母（所以不仅是 WB、CT 和 TN-SE），例如：

 "NBIO(15)"    "CITG-TP(08)" "BK-AR(10)"

所以应该是通用函数... 谢谢！

Answer 1

我认为我明白你想要什么。这当然适用于您的示例。

sub(".*?([^()]+\(\d+\))$", "\1", x)
[1] "WB (16)"    "WB (15)"    "TN-SE (17)"

详细信息： 这会在字符串末尾查找 SomeStuff (Numbers) 形式的内容，并丢弃之前的所有内容。 SomeStuff 不允许包含括号。

Answer 2

能否请您尝试以下。

sub(".*[0-9]+[^)]\)?([^)$])", "\1", x)

输出结果如下。

[1] "WB (16)"    "WB (15)"    "TN-SE (17)"

输入如下。

> x
[1] "WB (16)"                                   "CT (14)WB (15)"                           
[3] "ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)"

说明：以下仅作说明。

sub("                 ##Using sub function of Base R here.
                      ##sub works on method of sub(regex_to_match_current_line's_stuff, new_string/variable/value out of matched,regex, variable)
.*[0-9]+[^)]\)       ##Using look ahead method of regex by mentioning .*(everything till) a ) is NOT found then mentioning ) there to cover it too so it will match till a ) which is NOt on end of line.
?                     ##? this makes sure above regex is matched first and it will move for next regex condition as per look ahead functoianlity.
([^)$])",             ##() means in R to put a value into R's memory to remember it kind of place holder in memory, I am mentioning here to keep everything till a ) found at last.
"\1",                ##Substitute whole line with \1 means first place holder's value.
x)                    ##Mentioning variable/vector's name here.

如何仅在有多个值时使用 sub on

how to only use sub on when there are multiple values

regex

r

gsub

dataframe