计算所有数字列的加权平均值
Calculating the weighted mean of all numerical columns
示例数据:
library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50), # Creates a panel ID
Country = c(rep("Albania",30),rep("Belarus",50), rep("Chilipepper",20)),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
wt = 15*round(runif(100)/10,2),
Income = round(rnorm(10,-5,5),2),
Happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
Educ = round(rnorm(10,0.75,0.3),2))
DT [, uniqueID := .I] # Creates a unique ID #
DT$some_NA_factor <- factor(DT$some_NA_factor)
我想计算所有数字列的加权平均值,所以我尝试了:
DT_w <- DT[,lapply(Filter(is.numeric,.SD), function(x) weighted.mean(DT$wt, x, na.rm=TRUE)), by=c("Country", "Time")]
但它接着说:
Error in weighted.mean.default(DT$wt, x, na.rm = TRUE) :
'x' and 'w' must have the same length
我想我可能误解了语法。我这样做对吗?
两期:
当您使用 DT$wt
时,这是对 DT
table 中完整 wt
列的显式调用 - by
论点对它不起作用。 by
参数仅适用于没有 DT$
前缀的列。
weighted.mean()
的参数顺序首先是 x
,其次是 w
(权重)- 你似乎把这个倒过来了
解决这两个问题:
DT_w <- DT[,lapply(Filter(is.numeric,.SD), function(x) weighted.mean(x, w = wt, na.rm=TRUE)), by=c("Country", "Time")]
# runs without errors
示例数据:
library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50), # Creates a panel ID
Country = c(rep("Albania",30),rep("Belarus",50), rep("Chilipepper",20)),
some_NA = sample(0:5, 6),
some_NA_factor = sample(0:5, 6),
Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
wt = 15*round(runif(100)/10,2),
Income = round(rnorm(10,-5,5),2),
Happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
Educ = round(rnorm(10,0.75,0.3),2))
DT [, uniqueID := .I] # Creates a unique ID #
DT$some_NA_factor <- factor(DT$some_NA_factor)
我想计算所有数字列的加权平均值,所以我尝试了:
DT_w <- DT[,lapply(Filter(is.numeric,.SD), function(x) weighted.mean(DT$wt, x, na.rm=TRUE)), by=c("Country", "Time")]
但它接着说:
Error in weighted.mean.default(DT$wt, x, na.rm = TRUE) :
'x' and 'w' must have the same length
我想我可能误解了语法。我这样做对吗?
两期:
当您使用
DT$wt
时,这是对DT
table 中完整wt
列的显式调用 -by
论点对它不起作用。by
参数仅适用于没有DT$
前缀的列。weighted.mean()
的参数顺序首先是x
,其次是w
(权重)- 你似乎把这个倒过来了
解决这两个问题:
DT_w <- DT[,lapply(Filter(is.numeric,.SD), function(x) weighted.mean(x, w = wt, na.rm=TRUE)), by=c("Country", "Time")]
# runs without errors