Lubridate 无法正确解析包含 weekday/Month/Day/Year 的日期

Date containing weekday/Month/Day/Year can't be correctly parsed by Lubridate

问题

我是从某网站下载的数据库,大业栏目格式如下:

x <- c("Fri, Mar 1, 2019", "Sat, Mar 2, 2019", "Sun, Mar 3, 2019", "Mon, Mar 4, 2019", "Tue, Mar 5, 2019", "Wed, Mar 6, 2019", "Thu, Mar 7, 2019", "Fri, Mar 8, 2019", "Sat, Mar 9, 2019", "Sun, Mar 10, 2019", "Mon, Mar 11, 2019", "Tue, Mar 12, 2019", "Wed, Mar 13, 2019", "Thu, Mar 14, 2019", "Fri, Mar 15, 2019", "Sat, Mar 16, 2019", "Sun, Mar 17, 2019", "Mon, Mar 18, 2019", "Tue, Mar 19, 2019", "Wed, Mar 20, 2019", "Thu, Mar 21, 2019", "Fri, Mar 22, 2019", "Sat, Mar 23, 2019", "Sun, Mar 24, 2019", "Mon, Mar 25, 2019",  "Tue, Mar 26, 2019", "Wed, Mar 27, 2019", "Thu, Mar 28, 2019", "Fri, Mar 29, 2019", "Sat, Mar 30, 2019", "Sun, Mar 31, 2019")

其中包含从 3 月 1 日到 31 日的日期。我试图将它转换为日期格式,所以我在 lubridate:

中使用了 y ,dy 函数
library("lubridate")
mdy(x)

这导致了以下向量:

 [1] "2019-03-01" "2019-03-02" "2019-03-20" "2019-04-20" "2019-05-20" "2019-03-06"
 [7] "2019-03-07" "2019-03-08" "2019-03-09" "2019-10-20" "2019-11-20" "2019-12-20"
[13] "2019-03-13" "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17" "2019-03-18"
[19] "2019-03-19" "2019-03-20" "2019-03-21" "2019-03-22" "2019-03-23" "2019-03-24"
[25] "2019-03-25" "2019-03-26" "2019-03-27" "2019-03-28" "2019-03-29" "2019-03-30"
[31] "2019-03-31"

如您所见,大多数日期都是正确的,但它不适用于该月的第 4、5、10、11 和 12 天,它会将日期读作月份。我一直在尝试多种解决方案,但 none 目前有效

一些可能无效的解决方案

使用正则表达式从字符向量中删除工作日:

我认为解决这个问题的一种方法是删除字符串的工作日部分,所以我尝试删除逗号前的所有内容,但我做得并不完美:

library(stringr)
y <- str_extract(Dt,",.*$")
y 
 [1] ", Mar 1, 2019"  ", Mar 2, 2019"  ", Mar 3, 2019"  ", Mar 4, 2019" 
 [5] ", Mar 5, 2019"  ", Mar 6, 2019"  ", Mar 7, 2019"  ", Mar 8, 2019" 
 [9] ", Mar 9, 2019"  ", Mar 10, 2019" ", Mar 11, 2019" ", Mar 12, 2019"
 [13] ", Mar 13, 2019" ", Mar 14, 2019" ", Mar 15, 2019" ", Mar 16, 2019"
 [17] ", Mar 17, 2019" ", Mar 18, 2019" ", Mar 19, 2019" ", Mar 20, 2019"
 [21] ", Mar 21, 2019" ", Mar 22, 2019" ", Mar 23, 2019" ", Mar 24, 2019"
 [25] ", Mar 25, 2019" ", Mar 26, 2019" ", Mar 27, 2019" ", Mar 28, 2019"
 [29] ", Mar 29, 2019" ", Mar 30, 2019" ", Mar 31, 2019"

但现在当我使用 mdy 时,前 12 天都弄错了。

mdy(y)

[1] "2019-01-20" "2019-02-20" "2019-03-20" "2019-04-20" "2019-05-20" "2019-06-20"
[7] "2019-07-20" "2019-08-20" "2019-09-20" "2019-10-20" "2019-11-20" "2019-12-20"
[13] "2019-03-13" "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17" "2019-03-18"
[19] "2019-03-19" "2019-03-20" "2019-03-21" "2019-03-22" "2019-03-23" "2019-03-24"
[25] "2019-03-25" "2019-03-26" "2019-03-27" "2019-03-28" "2019-03-29" "2019-03-30"
[31] "2019-03-31"

关于如何解决这个问题有什么想法吗?

会话信息

我按要求添加了 SessionInfo

R version 3.4.4 (2018-03-15) 
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_CL.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_CL.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_CL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_CL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_1.3.1   dplyr_0.7.6     rvest_0.3.2     xml2_1.2.0      XML_3.98-1.16  
[6] lubridate_1.7.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     rstudioapi_0.7   knitr_1.20       bindr_0.1.1     
 [5] magrittr_1.5     tidyselect_0.2.4 R6_2.2.2         rlang_0.2.2     
 [9] httr_1.3.1       tools_3.4.4      pacman_0.4.6     selectr_0.4-1    
 [13] htmltools_0.3.6  yaml_2.2.0       rprojroot_1.3-2  digest_0.6.17   
 [17] assertthat_0.2.0 tibble_1.4.2     crayon_1.3.4     bindrcpp_0.2.2    
 [21] purrr_0.2.5      curl_3.2         glue_1.3.0       evaluate_0.11    
 [25] rmarkdown_1.10   stringi_1.2.4    pillar_1.3.0     compiler_3.4.4  
 [29] backports_1.1.2  pkgconfig_2.0.2 

正如@duckmayr 认为这是一个语言环境问题,如上所示,在我的会话信息中,我的语言环境设置如下:

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_CL.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_CL.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_CL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_CL.UTF-8 LC_IDENTIFICATION=C  

当我将 LC_TIME 更改为 en_US.UTF-8 时,一切都已修复,当我这样做时:

Sys.setlocale("LC_TIME", 'en_US.UTF-8')

然后使用 mdy 效果很好。希望这对以后遇到类似问题的人有所帮助