计算分组意味着忽略每一行值
Calculate grouped means ignoring each row value
我正在使用以下代码计算每个 class 的分组平均值。我需要每个 class 的平均值,将其放在每一行,但忽略公式中每一行的值(参见 expected_mean
列)。虽然该 DT 方法确实计算平均值,但它不会忽略每一行(请参阅第 value_mean
列)。
## create dataset
dataset <- data.frame(matrix(ncol = 2, nrow = 6))
colnames(dataset) <- c('class','value')
dataset$class <- c(rep('A',3),rep('B',3))
dataset$value <- 1:6
## convert to DT and aggregate
setDT(dataset)
dataset[, value_mean := mean(value), by=class]
## expected means (without itself)
dataset$expected_means <- c(2.5,2,1.5,5.5,5,4.5)
还有这个returns:
class value value_mean expected_means
A 1 2 2.5
A 2 2 2.0
A 3 2 1.5
B 4 5 5.5
B 5 5 5.0
B 6 5 4.5
我需要计算每个 class 的平均值,每行放置一次,但忽略当前值。例如,对于第一行,而不是 (1+2+3)/3
,它应该只做 (2+3)/2
肯定有比sapply
更有效的方法,但你可以这样做:
setDT(dataset)[, value_mean := sapply(1:.N, function(x) mean(value[-x])), by = class]
输出:
class value expected_means value_mean
1: A 1 2.5 2.5
2: A 2 2.0 2.0
3: A 3 1.5 1.5
4: B 4 5.5 5.5
5: B 5 5.0 5.0
6: B 6 4.5 4.5
您可以使用 sqldf
:
library(sqldf)
dataset <- data.frame(class = rep(c("A", "B"), each = 3),
value = 1:6,
stringsAsFactors = FALSE)
result = sqldf('select d.*,
t.sum * 1.0 / (t.count * 1.0) as value_mean,
(t.sum - d.value)*1.0/ ((t.count - 1) * 1.0) as expected_means
from dataset as d JOIN
(select class, sum(value) as sum, count(*) as count
from dataset
group by class) as t
on d.class = t.class')
这是另一个选项:
dataset[, expected_means := (sum(value) - value) / (.N - 1L), class]
我正在使用以下代码计算每个 class 的分组平均值。我需要每个 class 的平均值,将其放在每一行,但忽略公式中每一行的值(参见 expected_mean
列)。虽然该 DT 方法确实计算平均值,但它不会忽略每一行(请参阅第 value_mean
列)。
## create dataset
dataset <- data.frame(matrix(ncol = 2, nrow = 6))
colnames(dataset) <- c('class','value')
dataset$class <- c(rep('A',3),rep('B',3))
dataset$value <- 1:6
## convert to DT and aggregate
setDT(dataset)
dataset[, value_mean := mean(value), by=class]
## expected means (without itself)
dataset$expected_means <- c(2.5,2,1.5,5.5,5,4.5)
还有这个returns:
class value value_mean expected_means
A 1 2 2.5
A 2 2 2.0
A 3 2 1.5
B 4 5 5.5
B 5 5 5.0
B 6 5 4.5
我需要计算每个 class 的平均值,每行放置一次,但忽略当前值。例如,对于第一行,而不是 (1+2+3)/3
,它应该只做 (2+3)/2
肯定有比sapply
更有效的方法,但你可以这样做:
setDT(dataset)[, value_mean := sapply(1:.N, function(x) mean(value[-x])), by = class]
输出:
class value expected_means value_mean
1: A 1 2.5 2.5
2: A 2 2.0 2.0
3: A 3 1.5 1.5
4: B 4 5.5 5.5
5: B 5 5.0 5.0
6: B 6 4.5 4.5
您可以使用 sqldf
:
library(sqldf)
dataset <- data.frame(class = rep(c("A", "B"), each = 3),
value = 1:6,
stringsAsFactors = FALSE)
result = sqldf('select d.*,
t.sum * 1.0 / (t.count * 1.0) as value_mean,
(t.sum - d.value)*1.0/ ((t.count - 1) * 1.0) as expected_means
from dataset as d JOIN
(select class, sum(value) as sum, count(*) as count
from dataset
group by class) as t
on d.class = t.class')
这是另一个选项:
dataset[, expected_means := (sum(value) - value) / (.N - 1L), class]