从异构日期列中解析日期

Parsing date from a heterogenous date column

我有一个奇怪的日期列,其中月、日和年之间没有分隔符。这些日期之间的主要区别是 month/day.

中缺少 0
日期 实际日期
90898 1998-09-08
91198 1998-11-08
100298 1998-09-08
10599 1999-10-05
31699 1999-03-16

我想使用基础 R 中的 as.Datelubridate 包中的 parse_date_time 将其转换为常规日期格式并获取 NA。例如:

#some sample dates
structure(list(Dates = c("103001", "41400", "90501", "92200", 
"102999", "102401", "12800", "91900", "111901", "83199", "31700", 
"11400", "112200", "91099", "52199", "101101", "81999", "50401", 
"92701", "80801", "81601", "111600", "90799", "110998", "42001", 
"51801", "121498", "100899", "91499", "92598", "51900", "112499", 
"63000", "110601", "31699", "101698", "112398", "20201", "22301", 
"10599", "71101", "122898", "92899", "72799", "80400", "21100", 
"72800", "12099", "81100", "101599", "90399", "50400", "120800", 
"91898", "60299", "62701", "100298", "72501", "10300", "92600", 
"31601", "21800", "30999", "30200", "92499", "60200", "10902", 
"62300", "81800", "61301", "92998", "42199", "71400", "12902", 
"31902", "101999", "62199", "43099", "111698", "72100", "22399", 
"40402", "82301", "110398", "102798", "60900", "100300", "102098", 
"22002", "102700", "83001", "81199")), row.names = c(NA, -92L
), class = c("tbl_df", "tbl", "data.frame"))

我试过以下调用方式:

as.Date(dates$Dates,format="%m%d%y")

as.Date(dates$Dates,format="%m%j%y")

as.Date(dates$Dates,format="%n%j%y")

lubridate::parse_date_time(dates$Dates,orders="mdy")

而且它们在数据集中都有 return 个 NA。知道这里出了什么问题吗?

我们可以根据字符数插入0,然后应用mdy

library(lubridate)
mdy(sub("00$", "20", sub("^(.)(..)(..)$", "0\1\2\3", dates$Dates)))

-输出

[1] "2001-10-30" "2020-04-14" "2001-09-05" "2020-09-22" "1999-10-29" "2001-10-24" "2020-01-28" "2020-09-19" "2001-11-19" "1999-08-31" "2020-03-17"
[12] "2020-01-14" "2020-11-22" "1999-09-10" "1999-05-21" "2001-10-11" "1999-08-19" "2001-05-04" "2001-09-27" "2001-08-08" "2001-08-16" "2020-11-16"
[23] "1999-09-07" "1998-11-09" "2001-04-20" "2001-05-18" "1998-12-14" "1999-10-08" "1999-09-14" "1998-09-25" "2020-05-19" "1999-11-24" "2020-06-30"
[34] "2001-11-06" "1999-03-16" "1998-10-16" "1998-11-23" "2001-02-02" "2001-02-23" "1999-01-05" "2001-07-11" "1998-12-28" "1999-09-28" "1999-07-27"
[45] "2020-08-04" "2020-02-11" "2020-07-28" "1999-01-20" "2020-08-11" "1999-10-15" "1999-09-03" "2020-05-04" "2020-12-08" "1998-09-18" "1999-06-02"
[56] "2001-06-27" "1998-10-02" "2001-07-25" "2020-01-03" "2020-09-26" "2001-03-16" "2020-02-18" "1999-03-09" "2020-03-02" "1999-09-24" "2020-06-02"
[67] "2002-01-09" "2020-06-23" "2020-08-18" "2001-06-13" "1998-09-29" "1999-04-21" "2020-07-14" "2002-01-29" "2002-03-19" "1999-10-19" "1999-06-21"
[78] "1999-04-30" "1998-11-16" "2020-07-21" "1999-02-23" "2002-04-04" "2001-08-23" "1998-11-03" "1998-10-27" "2020-06-09" "2020-10-03" "1998-10-20"
[89] "2002-02-20" "2020-10-27" "2001-08-30" "1999-08-11"