从字符串和文本数据中提取年份

Question

我需要从具有这些性质的值的向量中提取开始年份和结束年份。

 yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")


 yr
 June 2013 – Present (2 years 9 months)
 January 2012 – June 2013 (1 year 6 months)
 2006 – Present (10 years)
 2002 – 2006 (4 years)

我期待这样的输出。有人有什么建议吗？

 start_yr       end_yr

2013            2016
2012            2013
2006            2016
2002            2006

Answer 1

x <- gsub("present", "2016", yr, ignore.case = TRUE)
x <- regmatches(x, gregexpr("\d{4}", x))
start_yr <- sapply(x, "[[", 1)
end_yr <- sapply(x, "[[", 2)

这将开始年份和结束年份保存在 2 个单独的变量中，如果您希望将它们保存在一个变量中，只需编辑代码并使 y$start_yr y$end_yr

Answer 2

另一个解决方案是使用 stringr 包

library(stringr)
x <- str_replace(yr, "Present", 2016)
DF <- as.data.frame(str_extract_all(x, "\d{4}", simplify = T))
names(DF) <- c("start_yr", "end_yr")
DF

你会得到

      start_yr end_yr
1     2013   2016
2     2012   2013
3     2006   2016
4     2002   2006

从字符串和文本数据中提取年份

Extract year from string and text data

regex

r

lubridate

stringi