计算 weakly/daily 多个时间序列内部数据 table 的均值

Computing weakly/daily mean in multiple time-series inside data table

我是 R 的新手,在使用 data.table 执行一些更复杂的操作时遇到了一些问题。我不确定它是否可以完成,所以我很乐意提供任何帮助。

首先,让我在简化的示例数据上描述我的问题和期望的目标:

type     region     quantity     timestamp
small     A            2         05/01/15 10:00
small     A            1         05/01/15 10:00
small     B            1         05/01/15 10:00
big       A            1         05/01/15 10:00
small     A            2         05/01/15 11:00
small     B            1         05/01/15 11:00
small     A            1         05/01/15 12:00
small     A            1         05/01/15 12:00
small     B            4         05/01/15 12:00
big       A            1         05/01/15 12:00
small     A            2         05/01/15 13:00
small     A            1         05/01/15 13:00
small     B            1         05/01/15 13:00
big       A            1         05/01/15 13:00
small     A            2         05/01/15 14:00
small     B            1         05/01/15 14:00
small     A            1         05/01/15 14:00
small     A            2         05/01/15 14:00
small     B            2         05/01/15 14:00
big       A            1         05/01/15 14:00
small     A            2         05/01/15 15:00
small     A            1         05/01/15 15:00
small     B            1         05/01/15 15:00
big       A            1         05/01/15 15:00
small     A            2         05/01/15 16:00
small     B            1         05/01/15 16:00
small     A            1         05/01/15 16:00
small     A            3         05/01/15 16:00
small     B            1         05/01/15 16:00
big       A            1         05/01/15 16:00
small     A            2         05/01/15 17:00
small     A            1         05/01/15 17:00
small     B            1         05/01/15 17:00
big       A            1         05/01/15 17:00
small     A            2         05/01/15 18:00
small     B            1         05/01/15 18:00
small     A            1         05/01/15 18:00
small     A            1         05/01/15 18:00
small     B            1         05/01/15 18:00
big       A            1         05/01/15 18:00
small     A            2         05/01/15 19:00
small     A            1         05/01/15 19:00
small     B            1         05/01/15 19:00
big       B            1         05/01/15 19:00
small     B            2         05/01/15 20:00
small     B            1         05/01/15 20:00
small     A            1         05/01/15 20:00
small     A            1         05/01/15 20:00
small     B            1         05/01/15 20:00
big       A            1         05/01/15 20:00
small     A            2         05/01/15 21:00
small     A            3         05/01/15 22:00
small     B            1         05/01/15 23:00
big       A            1         06/01/15 00:00
small     A            2         06/01/15 00:00
small     B            1         06/01/15 00:00
small     A            1         06/01/15 01:00
small     A            1         06/01/15 01:00
small     B            1         06/01/15 01:00
big       A            1         06/01/15 01:00
big       A            1         06/01/15 02:00
small     A            2         06/01/15 02:00
small     B            1         06/01/15 02:00
small     A            1         06/01/15 03:00
big       A            1         06/01/15 04:00
big       A            1         06/01/15 04:00
small     A            2         06/01/15 04:00
small     B            1         06/01/15 04:00
small     A            1         06/01/15 05:00
small     A            1         06/01/15 05:00
small     B            1         06/01/15 05:00
big       A            1         06/01/15 05:00

我需要做的是为每种类型和区域的独特组合生成弱均值(总量)。

这意味着例如:

weak1 (05/01/15 00:00 - 12/01/15 00:00): 50 hours of 'small' in region 'A'
...

类型和区域的每个唯一组合都必须单独处理。为此,我想我需要执行以下步骤:

1. Load the csv
2. Aggregate all the rows with same combinations together (there could be duplicate rows with different quantities)
3. Compute the weakly means or sums for each unique combination
4. Save the results into multiple csv files (one file per unique combination)

到目前为止,这是我的代码,我真的卡在了第 3 步和第 4 步。如果有人可以建议如何完成这样的事情,那将非常有帮助。

# parse CSV
library(data.table)
DF <- read.table(file="data.csv",header=TRUE,sep=",",check.names=FALSE)

# aggregate same values together
DT <- data.table(DF)
aggregated <- DT[, .(quant = sum(quantity)), by = .(timestamp, region, type)]

print(aggregated)

编辑:我在示例中添加了更多数据。为了理解这是如何完成的(并避免在此处发布数千行数据),仅计算每日均值就足够了。我相信将其转换为 weakly 意味着很容易。

EDIT2:如果组合在数据集中至少存在一次,我需要显示弱结果,即使它是 0。有没有办法在没有记录的时间段中插入零值?

警告:我通常不使用 data.tables。通常它是 dataframe 和 ddply 对我来说。但假设您的 data.table 聚合有效,下面的代码片段应该可以解决问题...

DT <- data.table(DF)
DT$date<-as.Date(DT$timestamp)
aggregated <- DT[, .(quant = mean(quantity)), by = .(date, region, type)]

编辑:

每周:

library(ISOweek)
DT <- data.table(DF)
DT$date<-as.Date(DT$timestamp)
DT$week<-ISOweek(DT$date)
aggregated <- DT[, .(quant = mean(quantity)), by = .(week, region, type)]