从字符串和文本数据中提取年份
Extract year from string and text data
我需要从具有这些性质的值的向量中提取开始年份和结束年份。
yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")
yr
June 2013 – Present (2 years 9 months)
January 2012 – June 2013 (1 year 6 months)
2006 – Present (10 years)
2002 – 2006 (4 years)
我期待这样的输出。有人有什么建议吗?
start_yr end_yr
2013 2016
2012 2013
2006 2016
2002 2006
x <- gsub("present", "2016", yr, ignore.case = TRUE)
x <- regmatches(x, gregexpr("\d{4}", x))
start_yr <- sapply(x, "[[", 1)
end_yr <- sapply(x, "[[", 2)
这将开始年份和结束年份保存在 2 个单独的变量中,如果您希望将它们保存在一个变量中,只需编辑代码并使 y$start_yr y$end_yr
另一个解决方案是使用 stringr
包
library(stringr)
x <- str_replace(yr, "Present", 2016)
DF <- as.data.frame(str_extract_all(x, "\d{4}", simplify = T))
names(DF) <- c("start_yr", "end_yr")
DF
你会得到
start_yr end_yr
1 2013 2016
2 2012 2013
3 2006 2016
4 2002 2006
我需要从具有这些性质的值的向量中提取开始年份和结束年份。
yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")
yr
June 2013 – Present (2 years 9 months)
January 2012 – June 2013 (1 year 6 months)
2006 – Present (10 years)
2002 – 2006 (4 years)
我期待这样的输出。有人有什么建议吗?
start_yr end_yr
2013 2016
2012 2013
2006 2016
2002 2006
x <- gsub("present", "2016", yr, ignore.case = TRUE)
x <- regmatches(x, gregexpr("\d{4}", x))
start_yr <- sapply(x, "[[", 1)
end_yr <- sapply(x, "[[", 2)
这将开始年份和结束年份保存在 2 个单独的变量中,如果您希望将它们保存在一个变量中,只需编辑代码并使 y$start_yr y$end_yr
另一个解决方案是使用 stringr
包
library(stringr)
x <- str_replace(yr, "Present", 2016)
DF <- as.data.frame(str_extract_all(x, "\d{4}", simplify = T))
names(DF) <- c("start_yr", "end_yr")
DF
你会得到
start_yr end_yr
1 2013 2016
2 2012 2013
3 2006 2016
4 2002 2006