R mdy_hms 不可预测的结果？

Question

使用 mdy_hms 函数处理我拥有的一些数据并且运行遇到了一个有趣的问题。我从许多来源上传数据，但它们都应该是 csv 格式并符合相同的准则，所以它们应该都是相同的格式。

我有 2 个变量。

> good_time
[1] "12/28/2019 16:22"
> test_time
[1] "3/4/2020 16:46"
> str(good_time)
chr "12/28/2019 16:22"
> str(test_time)
chr "3/4/2020 16:46"

所以它们在格式方面对我来说似乎是一样的，但是 good_time 可以通过 mdy_hms 解析得很好，而 test_time 不能。谁能给我解释一下为什么？

> mdy_hms(good_time)
[1] "2020-12-28 19:16:22 UTC"
> mdy_hms(test_time)
[1] NA
Warning message:
All formats failed to parse. No formats found.

奇怪的是，如果我使用 mdy_hm(test_time) 它工作正常。

> mdy_hm(test_time)
[1] "2020-03-04 16:46:00 UTC"

Answer 1

lubridate 期望在个位数月份（和天数）中有前导零。

来自?lubridate::mdy_hms：

truncated: integer, indicating how many formats can be missing. See
          details.

...

     The most common type of irregularity in date-time data is the
     truncation due to rounding or unavailability of the time stamp. If
     the 'truncated' parameter is non-zero, the 'ymd_hms()' functions
     also check for truncated formats. For example, 'ymd_hms()' with
     'truncated = 3' will also parse incomplete dates like 2012-06-01
     12:23, 2012-06-01 12 and '2012-06-01'. NOTE: The 'ymd()' family of
     functions is based on 'base::strptime()' which currently fails to
     parse %y-%m formats.

只需添加 truncated=1:

lubridate::mdy_hms("3/4/2020 16:46", truncated=1)
# [1] "2020-03-04 16:46:00 UTC"

（这也在 tidyverse/lubridate#669 中讨论过。）

R mdy_hms 不可预测的结果？

R mdy_hms unpredictable results?

r

datetime-format

lubridate