在 R 中将值格式更改为标准 30 秒格式
change of value format to standard 30-second format in R
我希望将非标准值更改格式数据(仅在 Value
更改时读取)格式化为标准的 30 秒间隔格式。
我有:df
:
Timestamp Value
6/26/2018 0:00:06 10
6/26/2018 0:01:06 15
6/26/2018 0:02:15 20
和dput
:
structure(list(Timestamp = c("6/26/2018 0:00:06", "6/26/2018 0:01:06",
"6/26/2018 0:02:15"), Value = c(10L, 15L, 20L)), .Names = c("Timestamp",
"Value"), class = "data.frame", row.names = c(NA, -3L))
我想要什么 formatted_df
:
Timestamp Value
6/26/2018 0:00:30 10
6/26/2018 0:01:00 10
6/26/2018 0:01:30 15
6/26/2018 0:02:00 15
6/26/2018 0:02:30 20
我的尝试:
使用 lubridate
和 dplyr
中的函数,我得到的间隔是 30 秒的倍数,但它没有 将 标准化为 30秒数:
formatted <- df %>% mutate(Timestamp_Date = as.POSIXct(Timestamp, tz = "US/Eastern", usetz = TRUE, format="%m/%d/%Y %H:%M:%S"),
rounded_timestamp = ceiling_date(Timestamp_Date, unit = "30 seconds"))
与 formatted
:
Timestamp Value Timestamp_Date rounded_timestamp
6/26/2018 0:00:06 10 6/26/2018 0:00:06 6/26/2018 0:00:30
6/26/2018 0:01:06 15 6/26/2018 0:01:06 6/26/2018 0:01:30
6/26/2018 0:02:15 20 6/26/2018 0:02:15 6/26/2018 0:02:30
我认为 lubridate
和 dplyr
在这里很有用,但我敢打赌 data.table
可以做到。
您可以使用 data.table
滚动连接。
library(data.table)
#convert df into data.table and Timestamp into POSIX format
setDT(df)[, Timestamp := as.POSIXct(Timestamp, format="%m/%d/%Y %H:%M:%S")]
#create the intervals of 30seconds according to needs
tstmp <- seq(as.POSIXct("2018-06-26 00:00:30", tz=""),
as.POSIXct("2018-06-26 00:02:30", tz=""),
by="30 sec")
#rolling join between intervals and df
df[.(Timestamp=tstmp), on=.(Timestamp), roll=Inf]
输出:
Timestamp Value
1: 2018-06-26 00:00:30 10
2: 2018-06-26 00:01:00 10
3: 2018-06-26 00:01:30 15
4: 2018-06-26 00:02:00 15
5: 2018-06-26 00:02:30 20
有关详细信息,请阅读 ?data.table
中的 roll
参数
我希望将非标准值更改格式数据(仅在 Value
更改时读取)格式化为标准的 30 秒间隔格式。
我有:df
:
Timestamp Value
6/26/2018 0:00:06 10
6/26/2018 0:01:06 15
6/26/2018 0:02:15 20
和dput
:
structure(list(Timestamp = c("6/26/2018 0:00:06", "6/26/2018 0:01:06",
"6/26/2018 0:02:15"), Value = c(10L, 15L, 20L)), .Names = c("Timestamp",
"Value"), class = "data.frame", row.names = c(NA, -3L))
我想要什么 formatted_df
:
Timestamp Value
6/26/2018 0:00:30 10
6/26/2018 0:01:00 10
6/26/2018 0:01:30 15
6/26/2018 0:02:00 15
6/26/2018 0:02:30 20
我的尝试:
使用 lubridate
和 dplyr
中的函数,我得到的间隔是 30 秒的倍数,但它没有 将 标准化为 30秒数:
formatted <- df %>% mutate(Timestamp_Date = as.POSIXct(Timestamp, tz = "US/Eastern", usetz = TRUE, format="%m/%d/%Y %H:%M:%S"),
rounded_timestamp = ceiling_date(Timestamp_Date, unit = "30 seconds"))
与 formatted
:
Timestamp Value Timestamp_Date rounded_timestamp
6/26/2018 0:00:06 10 6/26/2018 0:00:06 6/26/2018 0:00:30
6/26/2018 0:01:06 15 6/26/2018 0:01:06 6/26/2018 0:01:30
6/26/2018 0:02:15 20 6/26/2018 0:02:15 6/26/2018 0:02:30
我认为 lubridate
和 dplyr
在这里很有用,但我敢打赌 data.table
可以做到。
您可以使用 data.table
滚动连接。
library(data.table)
#convert df into data.table and Timestamp into POSIX format
setDT(df)[, Timestamp := as.POSIXct(Timestamp, format="%m/%d/%Y %H:%M:%S")]
#create the intervals of 30seconds according to needs
tstmp <- seq(as.POSIXct("2018-06-26 00:00:30", tz=""),
as.POSIXct("2018-06-26 00:02:30", tz=""),
by="30 sec")
#rolling join between intervals and df
df[.(Timestamp=tstmp), on=.(Timestamp), roll=Inf]
输出:
Timestamp Value
1: 2018-06-26 00:00:30 10
2: 2018-06-26 00:01:00 10
3: 2018-06-26 00:01:30 15
4: 2018-06-26 00:02:00 15
5: 2018-06-26 00:02:30 20
有关详细信息,请阅读 ?data.table
roll
参数