R timeSeries Aggregate()函数不包括小时的第一分钟
R timeSeries Aggregate() Function Not Including First Minute Of Hour
我正在尝试使用 R 中的 timeSeries 包从 timeSeries 对象聚合数据。我写了一些基本的示例代码供参考:
library(timeSeries)
library(timeDate)
BD <- as.timeDate(paste("2015-01-01", "00:00:00")) # Creates a timeDate.
ED <- as.timeDate(paste("2015-01-31", "23:59:00")) # Creates a timeDate.
DR <- seq(BD, ED, by = 60) # Creates a sequence by minutes in between the 2 dates.
data <- runif(length(DR), 0, 100) # Creating random sample data.
x <- timeSeries(data, DR) # Initializing a timeSeries object from data and DR.
colnames(x) <- "Data" # Renaming column.
by = timeSequence(BD, ED, by = "hour") # Setting the sequence to be aggregated on.
x.agg <- timeSeries::aggregate(x, by, sum) # Aggregating on that sequence.
在 运行 代码之后我的脑袋是这样的:
> head(x.agg)
GMT
Data
2015-01-01 00:00:00 29.71688
2015-01-01 01:00:00 3129.84860
2015-01-01 02:00:00 2398.92438
2015-01-01 03:00:00 3134.78608
2015-01-01 04:00:00 2743.79543
2015-01-01 05:00:00 3159.38404
请注意,第一个数据“2015-01-01 00:00:00”明显小于其他每小时总和,实际上它与原始数据样本中的数据点完全相同:
> head(x)
GMT
Data
2015-01-01 00:00:00 29.71688
2015-01-01 00:01:00 38.73175
2015-01-01 00:02:00 1.01945
2015-01-01 00:03:00 89.64938
2015-01-01 00:04:00 34.23608
2015-01-01 00:05:00 90.48571
调查总和的来源后,“2015-01-01 01:00:00”小时的汇总是“2015-01-”之间(含)之间所有时间的总和01 00:01:00”和“2015-01-01 01:00:00”,如代码所示:
> sum(x[2:61,])
[1] 3129.849
> x.agg[2,]
GMT
Data
2015-01-01 01:00:00 3129.849
我需要的是聚合“00:00:00”小时内所有数据点的总和,也就是说,“2015-01-01 00:00:00”的聚合应该等同于:
> sum(x[1:60,])
[1] 3065.829
包括那个小时的第一分钟,不是下一个小时的第一分钟,就像聚合所做的那样。似乎聚合函数正在考虑小时的第一分钟 而不是 是该小时的一部分,我觉得这很奇怪。任何帮助将不胜感激。
看来我找到了自己问题的答案,它涉及修改 timeSeries::aggregate()
函数的源代码。要实现我在上述问题中想要的,请转到 timeSeries 包的源代码,通过在此处从 CRAN 下载 tar.gz 文件最容易找到它:
https://cran.r-project.org/web/packages/timeSeries/index.html
解压缩文件并进入 timeSeries 文件夹内的 R 文件夹。找到 "stats-aggregate.R" 文件并在 R 中打开它。在其中,您将看到 .aggregate.timeSeries
函数。在该函数内部,需要更改以获得我想要的结果的是 +1
需要从第 80 行和第 81 行中删除。这样做之后,聚合函数将按照我想要的方式聚合到.
这里是文中修改后的函数(我也改了名字):
`modTSAgg <-
function(x, by, FUN, ...)
{
# A function implemented by Yohan Chalabi and Diethelm Wuertz
# Description:
# Aggregates a 'timeSeries' object
# Details:
# This function can be used to aggregate and coursen a
# 'timeSeries' object.
# Arguments:
# x - a 'timeSeries' object to be aggregated
# by - a calendarical block
# FUN - function to be applied, by default 'colMeans'
# ... - additional argument to be passed to the newly generated
# 'timeSeries' object
# Value:
# Returns a S4 object of class 'timeSeries'.
# Examples:
# Quarterly Aggregation:
# m = matrix(rep(1:12,2), ncol = 2)
# ts = timeSeries(m, timeCalendar())
# Y = getRmetricsOptions("currentYear"); Y
# from = paste(Y, "04-01", sep = "-"); to = paste(Y+1, "01-01", sep = "-")
# by = timeSequence(from, to, by = "quarter") - 24*3600; by
# ts; aggregate(ts, by, sum)
# Weekly Aggregation:
# dates = timeSequence(from = "2009-01-01", to = "2009-02-01", by = "day")
# data = 10 * round(matrix(rnorm(2*length(dates)), ncol = 2), 1); data
# ts = timeSeries(data = data, charvec = dates)
# by = timeSequence(from = "2009-01-08", to = "2009-02-01", by = "week")
# by = by - 24*3600; aggregate(ts, by, sum)
# FUNCTION:
# Check Arguments:
if (!((inherits(by, "timeDate") && x@format != "counts") ||
(is.numeric(by) && x@format == "counts")))
stop("'by' should be of the same class as 'time(x)'", call.=FALSE)
# Extract Title and Documentation:
Title <- x@title
Documentation <- x@documentation
# Make sure that x is sorted:
if (is.unsorted(x))
x <- sort(x)
# Sort and remove double entries in by:
by <- unique(sort(by))
INDEX <- findInterval(x@positions, as.numeric(by, "sec"))
INDEX <- INDEX
is.na(INDEX) <- !(INDEX <= length(by))
# YC : ncol important to avoid problems of dimension dropped by apply
data <- matrix(apply(getDataPart(x), 2, tapply, INDEX, FUN), ncol=ncol(x))
rownames(data) <- as.character(by[unique(na.omit(INDEX))])
colnames(data) <- colnames(x)
ans <- timeSeries(data, ...)
# Preserve Title and Documentation:
ans@title <- Title
ans@documentation <- Documentation
# Return Value:
ans
}
setMethod("aggregate", "timeSeries", function(x, by, FUN, ...)
modTSAgg(x, by, FUN, ...))
# until UseMethod dispatches S4 methods in 'base' functions
aggregate.timeSeries <- function(x, ...) modTSAgg(x, ...)`
我正在尝试使用 R 中的 timeSeries 包从 timeSeries 对象聚合数据。我写了一些基本的示例代码供参考:
library(timeSeries)
library(timeDate)
BD <- as.timeDate(paste("2015-01-01", "00:00:00")) # Creates a timeDate.
ED <- as.timeDate(paste("2015-01-31", "23:59:00")) # Creates a timeDate.
DR <- seq(BD, ED, by = 60) # Creates a sequence by minutes in between the 2 dates.
data <- runif(length(DR), 0, 100) # Creating random sample data.
x <- timeSeries(data, DR) # Initializing a timeSeries object from data and DR.
colnames(x) <- "Data" # Renaming column.
by = timeSequence(BD, ED, by = "hour") # Setting the sequence to be aggregated on.
x.agg <- timeSeries::aggregate(x, by, sum) # Aggregating on that sequence.
在 运行 代码之后我的脑袋是这样的:
> head(x.agg)
GMT
Data
2015-01-01 00:00:00 29.71688
2015-01-01 01:00:00 3129.84860
2015-01-01 02:00:00 2398.92438
2015-01-01 03:00:00 3134.78608
2015-01-01 04:00:00 2743.79543
2015-01-01 05:00:00 3159.38404
请注意,第一个数据“2015-01-01 00:00:00”明显小于其他每小时总和,实际上它与原始数据样本中的数据点完全相同:
> head(x)
GMT
Data
2015-01-01 00:00:00 29.71688
2015-01-01 00:01:00 38.73175
2015-01-01 00:02:00 1.01945
2015-01-01 00:03:00 89.64938
2015-01-01 00:04:00 34.23608
2015-01-01 00:05:00 90.48571
调查总和的来源后,“2015-01-01 01:00:00”小时的汇总是“2015-01-”之间(含)之间所有时间的总和01 00:01:00”和“2015-01-01 01:00:00”,如代码所示:
> sum(x[2:61,])
[1] 3129.849
> x.agg[2,]
GMT
Data
2015-01-01 01:00:00 3129.849
我需要的是聚合“00:00:00”小时内所有数据点的总和,也就是说,“2015-01-01 00:00:00”的聚合应该等同于:
> sum(x[1:60,])
[1] 3065.829
包括那个小时的第一分钟,不是下一个小时的第一分钟,就像聚合所做的那样。似乎聚合函数正在考虑小时的第一分钟 而不是 是该小时的一部分,我觉得这很奇怪。任何帮助将不胜感激。
看来我找到了自己问题的答案,它涉及修改 timeSeries::aggregate()
函数的源代码。要实现我在上述问题中想要的,请转到 timeSeries 包的源代码,通过在此处从 CRAN 下载 tar.gz 文件最容易找到它:
https://cran.r-project.org/web/packages/timeSeries/index.html
解压缩文件并进入 timeSeries 文件夹内的 R 文件夹。找到 "stats-aggregate.R" 文件并在 R 中打开它。在其中,您将看到 .aggregate.timeSeries
函数。在该函数内部,需要更改以获得我想要的结果的是 +1
需要从第 80 行和第 81 行中删除。这样做之后,聚合函数将按照我想要的方式聚合到.
这里是文中修改后的函数(我也改了名字):
`modTSAgg <-
function(x, by, FUN, ...)
{
# A function implemented by Yohan Chalabi and Diethelm Wuertz
# Description:
# Aggregates a 'timeSeries' object
# Details:
# This function can be used to aggregate and coursen a
# 'timeSeries' object.
# Arguments:
# x - a 'timeSeries' object to be aggregated
# by - a calendarical block
# FUN - function to be applied, by default 'colMeans'
# ... - additional argument to be passed to the newly generated
# 'timeSeries' object
# Value:
# Returns a S4 object of class 'timeSeries'.
# Examples:
# Quarterly Aggregation:
# m = matrix(rep(1:12,2), ncol = 2)
# ts = timeSeries(m, timeCalendar())
# Y = getRmetricsOptions("currentYear"); Y
# from = paste(Y, "04-01", sep = "-"); to = paste(Y+1, "01-01", sep = "-")
# by = timeSequence(from, to, by = "quarter") - 24*3600; by
# ts; aggregate(ts, by, sum)
# Weekly Aggregation:
# dates = timeSequence(from = "2009-01-01", to = "2009-02-01", by = "day")
# data = 10 * round(matrix(rnorm(2*length(dates)), ncol = 2), 1); data
# ts = timeSeries(data = data, charvec = dates)
# by = timeSequence(from = "2009-01-08", to = "2009-02-01", by = "week")
# by = by - 24*3600; aggregate(ts, by, sum)
# FUNCTION:
# Check Arguments:
if (!((inherits(by, "timeDate") && x@format != "counts") ||
(is.numeric(by) && x@format == "counts")))
stop("'by' should be of the same class as 'time(x)'", call.=FALSE)
# Extract Title and Documentation:
Title <- x@title
Documentation <- x@documentation
# Make sure that x is sorted:
if (is.unsorted(x))
x <- sort(x)
# Sort and remove double entries in by:
by <- unique(sort(by))
INDEX <- findInterval(x@positions, as.numeric(by, "sec"))
INDEX <- INDEX
is.na(INDEX) <- !(INDEX <= length(by))
# YC : ncol important to avoid problems of dimension dropped by apply
data <- matrix(apply(getDataPart(x), 2, tapply, INDEX, FUN), ncol=ncol(x))
rownames(data) <- as.character(by[unique(na.omit(INDEX))])
colnames(data) <- colnames(x)
ans <- timeSeries(data, ...)
# Preserve Title and Documentation:
ans@title <- Title
ans@documentation <- Documentation
# Return Value:
ans
}
setMethod("aggregate", "timeSeries", function(x, by, FUN, ...)
modTSAgg(x, by, FUN, ...))
# until UseMethod dispatches S4 methods in 'base' functions
aggregate.timeSeries <- function(x, ...) modTSAgg(x, ...)`