如何仅在有多个值时使用 sub on
how to only use sub on when there are multiple values
所以这是数据框的一个简短示例:
x<- c("WB (16)","CT (14)WB (15)","ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)")
我的问题是如何让 sub(".*?)", "", x)
(或不同的函数)工作,从而得到以下结果:
x<-c("WB (16)","WB (15)","TN-SE(17)")
而不是
x<-c("","WB (15)")
我得到了不同类型的字母(所以不仅是 WB、CT 和 TN-SE),例如:
"NBIO(15)" "CITG-TP(08)" "BK-AR(10)"
所以应该是通用函数...
谢谢!
我认为我明白你想要什么。这当然适用于您的示例。
sub(".*?([^()]+\(\d+\))$", "\1", x)
[1] "WB (16)" "WB (15)" "TN-SE (17)"
详细信息: 这会在字符串末尾查找 SomeStuff (Numbers)
形式的内容,并丢弃之前的所有内容。 SomeStuff 不允许包含括号。
能否请您尝试以下。
sub(".*[0-9]+[^)]\)?([^)$])", "\1", x)
输出结果如下。
[1] "WB (16)" "WB (15)" "TN-SE (17)"
输入如下。
> x
[1] "WB (16)" "CT (14)WB (15)"
[3] "ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)"
说明:以下仅作说明。
sub(" ##Using sub function of Base R here.
##sub works on method of sub(regex_to_match_current_line's_stuff, new_string/variable/value out of matched,regex, variable)
.*[0-9]+[^)]\) ##Using look ahead method of regex by mentioning .*(everything till) a ) is NOT found then mentioning ) there to cover it too so it will match till a ) which is NOt on end of line.
? ##? this makes sure above regex is matched first and it will move for next regex condition as per look ahead functoianlity.
([^)$])", ##() means in R to put a value into R's memory to remember it kind of place holder in memory, I am mentioning here to keep everything till a ) found at last.
"\1", ##Substitute whole line with \1 means first place holder's value.
x) ##Mentioning variable/vector's name here.
所以这是数据框的一个简短示例:
x<- c("WB (16)","CT (14)WB (15)","ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)")
我的问题是如何让 sub(".*?)", "", x)
(或不同的函数)工作,从而得到以下结果:
x<-c("WB (16)","WB (15)","TN-SE(17)")
而不是
x<-c("","WB (15)")
我得到了不同类型的字母(所以不仅是 WB、CT 和 TN-SE),例如:
"NBIO(15)" "CITG-TP(08)" "BK-AR(10)"
所以应该是通用函数... 谢谢!
我认为我明白你想要什么。这当然适用于您的示例。
sub(".*?([^()]+\(\d+\))$", "\1", x)
[1] "WB (16)" "WB (15)" "TN-SE (17)"
详细信息: 这会在字符串末尾查找 SomeStuff (Numbers)
形式的内容,并丢弃之前的所有内容。 SomeStuff 不允许包含括号。
能否请您尝试以下。
sub(".*[0-9]+[^)]\)?([^)$])", "\1", x)
输出结果如下。
[1] "WB (16)" "WB (15)" "TN-SE (17)"
输入如下。
> x
[1] "WB (16)" "CT (14)WB (15)"
[3] "ET (13)CITG-TILm (16)EE-SS (17)TN-SE (17)"
说明:以下仅作说明。
sub(" ##Using sub function of Base R here.
##sub works on method of sub(regex_to_match_current_line's_stuff, new_string/variable/value out of matched,regex, variable)
.*[0-9]+[^)]\) ##Using look ahead method of regex by mentioning .*(everything till) a ) is NOT found then mentioning ) there to cover it too so it will match till a ) which is NOt on end of line.
? ##? this makes sure above regex is matched first and it will move for next regex condition as per look ahead functoianlity.
([^)$])", ##() means in R to put a value into R's memory to remember it kind of place holder in memory, I am mentioning here to keep everything till a ) found at last.
"\1", ##Substitute whole line with \1 means first place holder's value.
x) ##Mentioning variable/vector's name here.