从R中的单个字符串中提取两个子字符串

Question

我有一个这样的文本字段： -- :location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z

我想提取 12.839006423950195 和 77.6580810546875 并将它们放入同一数据框中的不同列中。

这些数字的长度各不相同 - 唯一的方法是提取第一和第二个单引号以及第三和第四个单引号内的内容。

我尝试使用 str_locate_all、str_match_all，但我无法理解它。请帮忙。

谢谢

Answer 1

我们可以使用 library(stringr) 中的 str_extract_all。我们使用正则表达式查找来匹配一个或多个带小数点的数字 ([0-9.]+)，小数点在单引号内（(?<=') 和 (?=')）。

library(stringr)
lst <- lapply(str_extract_all(txt, "(?<=')[0-9.]+(?=')") , as.numeric)

如果列表元素的长度相同

df1 <- setNames(do.call(rbind.data.frame, lst), paste0('V', 1:2))

会得到 2 列 'data.frame'

数据

txt <- ":location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z"

Answer 2

不使用任何库也可以这样做：

txt <- ":location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z"
start<-gregexpr("('.*?)[0-9.](.*?')+",txt)[[1]]+1
end<-start+attr(start,"match.length")-3
df<-data.frame(t(apply(cbind(start[1:2],end[1:2]),1,function(x) substr(txt,x[1],x[2]))))

> df
              X1               X2
1 12.839006423950195 77.6580810546875

感谢@thelatemail:

txt <- ":location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z"
df<-data.frame(t(regmatches(txt, gregexpr("(?<=')[0-9.]+(?=')",txt,perl=TRUE))[[1]]))
df

                  X1               X2
1 12.839006423950195 77.6580810546875

从R中的单个字符串中提取两个子字符串

Extract two substrings from a single string in R

string

substring

r

数据