R: space 之间的 gsub 单词
R: gsub words between space
我有一个看起来像这样的文本:
a <- "233,236,241 solitude ΔE=1.9"
我想做的是提取两个空格 ( ) 之间的第二个单词,给出这个输出
> solitude
我尝试了两种方法:
a1 <- strsplit(a,' ',fixed=TRUE)[[1]][2]
a2 <- sapply(strsplit(a, " ", fixed=TRUE), "[", 2)
但总是显示:
ΔE=1.9
正确的做法是什么?
试试这个:
gsub("\s.+$","",gsub("^.+[[:digit:]]\s","",a))
这是一种使用捕获 classes(括号内的模式)和字符 classes(方括号内的模式)的方法。
sub("(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
[1] "solitude"
注释第一个捕获 class 模式:
"(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
\finds first space
\ an arbitrary number of times
\ inside a character class an '^' as the first character ...
signals negation of character class. This one with only the space character in it.
\----- '^' marks the beginning of a character value
第二次捕获 class 模式:
"(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
\ an arbitrary number of times
\negation of character class with only the space character in it.
第三次捕获class:
"(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
\ the second space
\anything after second space to end.
replacement
中的 "\<n>"
条目指的是捕获 class 匹配 n 它们在 pattern
参数中出现的顺序。
我有一个看起来像这样的文本:
a <- "233,236,241 solitude ΔE=1.9"
我想做的是提取两个空格 ( ) 之间的第二个单词,给出这个输出
> solitude
我尝试了两种方法:
a1 <- strsplit(a,' ',fixed=TRUE)[[1]][2]
a2 <- sapply(strsplit(a, " ", fixed=TRUE), "[", 2)
但总是显示:
ΔE=1.9
正确的做法是什么?
试试这个:
gsub("\s.+$","",gsub("^.+[[:digit:]]\s","",a))
这是一种使用捕获 classes(括号内的模式)和字符 classes(方括号内的模式)的方法。
sub("(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
[1] "solitude"
注释第一个捕获 class 模式:
"(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
\finds first space
\ an arbitrary number of times
\ inside a character class an '^' as the first character ...
signals negation of character class. This one with only the space character in it.
\----- '^' marks the beginning of a character value
第二次捕获 class 模式:
"(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
\ an arbitrary number of times
\negation of character class with only the space character in it.
第三次捕获class:
"(^[^ ]*[ ])([^ ]*)([ ].*$)" , "\2", a)
\ the second space
\anything after second space to end.
replacement
中的 "\<n>"
条目指的是捕获 class 匹配 n 它们在 pattern
参数中出现的顺序。