R - 在字符串右侧第 n 次出现字符后提取信息
R - Extract info after nth occurrence of a character from the right of string
我见过许多提取 w/ gsub
的迭代,但它们主要处理从左到右或出现一次后提取。我想从右到左匹配,计算 -
的四次出现次数,匹配第 3 次和第 4 次出现之间的所有内容。
例如:
string outcome
here-are-some-words-to-try some
a-b-c-d-e-f-g-h-i f
以下是我尝试使用的一些参考资料:
regex - return all before the second occurrence
x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
ind = unlist(gregexpr(pattern = "-", text = strings))
if (length(ind) < 4){NA}
else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try a-b-c-d-e-f-g-h-i
# "some" "f"
你可以使用
([^-]+)(?:-[^-]+){3}$
在 R
这可能是
library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)
df <- df %>%
mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df
并产生
string outcome
1 here-are-some-words-to-try some
2 a-b-c-d-e-f-g-h-i f
3 no dash in here <NA>
拆分你的句子怎么样?像
string <- "here-are-some-words-to-try"
# separate all words
val <- strsplit(string, "-")[[1]]
# reverse the order
val rev(val)
# take the 4th element
val[4]
# And using a dataframe
library(tidyverse)
tibble(string = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")) %>%
mutate(outcome = map_chr(string, function(s) rev(strsplit(s, "-")[[1]])[4]))
我见过许多提取 w/ gsub
的迭代,但它们主要处理从左到右或出现一次后提取。我想从右到左匹配,计算 -
的四次出现次数,匹配第 3 次和第 4 次出现之间的所有内容。
例如:
string outcome
here-are-some-words-to-try some
a-b-c-d-e-f-g-h-i f
以下是我尝试使用的一些参考资料:
regex - return all before the second occurrence
x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
ind = unlist(gregexpr(pattern = "-", text = strings))
if (length(ind) < 4){NA}
else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try a-b-c-d-e-f-g-h-i
# "some" "f"
你可以使用
([^-]+)(?:-[^-]+){3}$
在
R
这可能是
library(dplyr)
library(stringr)
df <- data.frame(string = c('here-are-some-words-to-try', 'a-b-c-d-e-f-g-h-i', ' no dash in here'), stringsAsFactors = FALSE)
df <- df %>%
mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2])
df
并产生
string outcome
1 here-are-some-words-to-try some
2 a-b-c-d-e-f-g-h-i f
3 no dash in here <NA>
拆分你的句子怎么样?像
string <- "here-are-some-words-to-try"
# separate all words
val <- strsplit(string, "-")[[1]]
# reverse the order
val rev(val)
# take the 4th element
val[4]
# And using a dataframe
library(tidyverse)
tibble(string = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")) %>%
mutate(outcome = map_chr(string, function(s) rev(strsplit(s, "-")[[1]])[4]))