使用聚合计算月加权平均值

Question

我需要计算每月加权平均值。数据框如下所示：

            Month Variable Weighting
460773 1998-06-01       11    153.00
337134 1998-06-01        9      0.96
473777 1998-06-01       10    264.00
358226 1998-06-01        6      0.52
414626 1998-06-01       10     34.00
341020 1998-05-01        9      1.64
453066 1998-05-01        5     26.00
183276 1998-05-01        8      0.51
403729 1998-05-01        6    123.00
203005 1998-05-01       11      0.89

当我使用 aggregate 例如

 Output <- aggregate(Variable ~ Month, df , mean )
 Output
       Month Variable
1 1998-05-01      7.8
2 1998-06-01      9.2

但是，当我尝试向聚合添加权重时，我得到了正确的结果，例如

Output <- aggregate(Variable ~ Month, df , FUN = weighted.mean, w = df$Weighting)

我收到一个不同的向量长度错误：

Error in weighted.mean.default(X[[1L]], ...) : 
'x' and 'w' must have the same length

有没有办法补救这种情况？

Answer 1

使用 aggregate() 是不可能的，因为您的权重向量在 aggregate() 期间未分区。您可以使用 by() 或 split() 加 sapply() 或附加包 data.table 或包 plyr 中的函数 ddply() 或包 [=23 中的函数=]

示例 split() 加上 sapply():

sapply(split(df, df$Month), function(d) weighted.mean(d$Variable, w = d$Weighting))

结果：

1998-05-01 1998-06-01 
   5.89733   10.33142

具有 by()

的变体

by(df, df$Month, FUN=function(d) weighted.mean(d$Variable, w = d$Weighting)) # or
unclass(by(df, df$Month, FUN=function(d) weighted.mean(d$Variable, w = d$Weighting)))

有包裹plyr

library(plyr)
ddply(df, ~Month, summarize, weighted.mean(Variable, w=Weighting))

和data.table

library(data.table)
setDT(df)[, weighted.mean(Variable, w = Weighting), Month]

Answer 2

如果您没有安装plyr、dplyr或data.table并且由于某些原因无法安装它们，仍然可以使用aggregate要计算每月加权平均值，您只需执行以下技巧，

df$row <- 1:nrow(df) #the trick
aggregate(row~Month, df, function(i) mean(df$Variable[i])) #mean
aggregate(row~Month, df, function(i) weighted.mean(df$Variable[i], df$Weighting[i])) #weighted mean

输出如下：

平均值：

> aggregate(row~Month, df, function(i) mean(df$Variable[i]))
       Month row
1 1998-05-01 7.8
2 1998-06-01 9.2

加权平均值：

> aggregate(row~Month, df, function(i) weighted.mean(df$Variable[i], df$Weighting[i]))
       Month      row
1 1998-05-01  5.89733
2 1998-06-01 10.33142

使用聚合计算月加权平均值

Using aggregate to compute monthly weighted average

aggregate

r

time-series