R 从日期中删除小时、分钟和秒

Question

在将数据帧转换为 xts 时，我意识到格式化程序有问题。这是一个示例数据框：

effective_date         price
"1990-01-01"  "100"
"1990-01-02 00:05:00"  "200"

这是我使用的包的示例输出。

将其转换为 xts 很简单

xts(df["price"], order_by=as.POSIXct(df["effective_date"], format="%Y-%m-%d %H:%M:%S")

然而这会出错，说 NAs 不能在行名中，结果是：

<NA>       100
1990-01-02 00:05:00  200

显然 xts 无法弄清楚如何处理那里的奇怪日期（午夜），它不会强迫它。

如果我将 tz="UTC" 添加到 as.POSIXct 它不起作用。此外，as.POSIXlt 也不会在此处更改任何内容。

我该怎么做才能将午夜日期强制转换为正确的格式？

Answer 1

假设你想要时间戳，用类似的东西进行预处理：

temp <- c("1990-01-01", "1990-01-02 00:05:00")

# match a date string at the end of string (indicated by $). Replace
# with the full string (indicated by \1 and 00:00:00
temp2 <- gsub("(\d{4}\-\d{2}\-\d{2}$)", "\1 00:00:00", temp)

# [1] "1990-01-01 00:00:00" "1990-01-02 00:05:00"

Answer 2

两期：

1) 您不能将日期单独解析为具有给定格式的 POSIXct：

R> as.POSIXct(c("2017-01-02", "2017-01-03 04:05:06"), format="%Y-%m-%d %H:%M:%S")
[1] NA                        "2017-01-03 04:05:06 CST"
R>

2) 但是您可以使用 anytime() 函数来完成它：

R> anytime::anytime(c("2017-01-02", "2017-01-03 04:05:06"))
[1] "2017-01-02 00:00:00 CST" "2017-01-03 04:05:06 CST"
R>

一旦你有了 POSIXct，形成 xts 就很容易了。

另请注意，您有拼写错误：列指示符前需要一个逗号：df[, "price"].

编辑： 对@42 对 Gabor 的（很好的）解决方案的评论感到有点厌倦 "dominating" 这个，所以这是最小基准：

R> library(microbenchmark)
R> v <- c("2017-01-02", "2017-01-03 04:05:06")
R> library(anytime)
R> print(microbenchmark(anytime(v), do.call("c", lapply(v, as.POSIXct))), digits=3)
Unit: microseconds
                                expr   min    lq  mean median    uq   max neval cld
                          anytime(v)  33.6  36.8  42.1   45.6  46.6  80.7   100  a 
 do.call("c", lapply(v, as.POSIXct)) 571.5 579.1 586.4  586.8 589.5 695.7   100   b
R>

所以简而言之 "not really"。它仅使用 R Base，这是一个优点，它是 a) 更难阅读和理解，b) 由于它处理 exactly one 格式（ISO 样式）和 c ) 它大约 慢了十三倍。

Answer 3

大多数 lubridate 的解析函数都有一个 truncated 参数，该参数采用一个数字，表示末尾可以缺少的元素数。缺失的元素将被替换为零。

手头数据示例：

lubridate::ymd_hms(c("2017-01-02", "2017-01-03 04:05:06"), truncated = 3)
## [1] "2017-01-02 00:00:00 UTC" "2017-01-03 04:05:06 UTC"

Answer 4

1) 要获取 "POSIXct" 日期时间向量，请尝试将每个日期时间分别转换为 "POSIXct"，然后将它们连接在一起：

do.call("c", lapply(df$effective_date, as.POSIXct))

2) 另一个更短但速度也更快的基本解决方案如下，它依赖于 as.POSIXct 最后将忽略垃圾的事实。

as.POSIXct(paste(df$effective, "00:00:00"))

R 从日期中删除小时、分钟和秒

R drops hours, minutes, and seconds from date

time

datetime

r

xts