如何在 R 中使用 data.table 对多行、多列进行平均?
How to average across several rows, for many columns, using data.table in R?
我有一个数据集,其中成对的行在变量 X1 上可以具有相同的值。我想将 2:40 列中的这些成对行的值平均到每个新的单行中。有没有简单的方法可以做到这一点?
如果它只是一个列,我想我可以这样做:
d[, X2 := X2, by = X1]
但这对于多列来说变得非常乏味。有没有办法在 data.table 中执行此操作而不必为每一列输入 X := X
?
编辑:
这是一个可重现的例子。我基本上想以十行结束,每一行对应“cat”的每个值。这些行将包含该级别“猫”的 x1、x2 和 x3 的平均值。
cat <- rep(1:10, times = 2)
x1 <- rnorm(20)
x2 <- rnorm(20)
x3 <- rnorm(20)
dat <- cbind(cat, x1, x2, x3)
dat <- as.data.frame(dat)
我不确定这个解决方案是否适合,因为你没有提供 minimal reproducible example,但也许是这样的?
library(data.table)
df <- data.frame(X1 = rep(1:50, each = 2),
X2 = rep(x = 1:2, times = 50),
X3 = rep(x = 1:2, times = 50),
X4 = rep(x = 1:2, times = 50),
X5 = rep(x = 1:2, times = 50),
X6 = rep(x = 1:2, times = 50),
X7 = rep(x = 1:2, times = 50),
X8 = rep(x = 1:2, times = 50),
X9 = rep(x = 1:2, times = 50),
X10 = rep(x = 1:2, times = 50)
)
setDT(df)
head(df)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1 1 1 1 1 1 1 1 1
#> 2: 1 2 2 2 2 2 2 2 2 2
#> 3: 2 1 1 1 1 1 1 1 1 1
#> 4: 2 2 2 2 2 2 2 2 2 2
#> 5: 3 1 1 1 1 1 1 1 1 1
#> 6: 3 2 2 2 2 2 2 2 2 2
df2 <- df[ ,lapply(.SD, mean), by = X1, .SDcols = X2:X10]
head(df2)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2: 2 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3: 3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4: 4 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5: 5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6: 6 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
由 reprex package (v2.0.0)
于 2021-07-16 创建
--
或者这个?
library(data.table)
df <- data.frame(X1 = 1:100,
X2 = rep(x = 1:2, times = 50),
X3 = rep(x = 1:2, times = 50),
X4 = rep(x = 1:2, times = 50),
X5 = rep(x = 1:2, times = 50),
X6 = rep(x = 1:2, times = 50),
X7 = rep(x = 1:2, times = 50),
X8 = rep(x = 1:2, times = 50),
X9 = rep(x = 1:2, times = 50),
X10 = rep(x = 1:2, times = 50)
)
setDT(df)
head(df)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1 1 1 1 1 1 1 1 1
#> 2: 2 2 2 2 2 2 2 2 2 2
#> 3: 3 1 1 1 1 1 1 1 1 1
#> 4: 4 2 2 2 2 2 2 2 2 2
#> 5: 5 1 1 1 1 1 1 1 1 1
#> 6: 6 2 2 2 2 2 2 2 2 2
df2 <- df[, lapply(.SD, mean, na.rm=TRUE), X1-0:1]
head(df2)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2: 3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3: 5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4: 7 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5: 9 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6: 11 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
由 reprex package (v2.0.0)
于 2021-07-16 创建
我有一个数据集,其中成对的行在变量 X1 上可以具有相同的值。我想将 2:40 列中的这些成对行的值平均到每个新的单行中。有没有简单的方法可以做到这一点?
如果它只是一个列,我想我可以这样做:
d[, X2 := X2, by = X1]
但这对于多列来说变得非常乏味。有没有办法在 data.table 中执行此操作而不必为每一列输入 X := X
?
编辑:
这是一个可重现的例子。我基本上想以十行结束,每一行对应“cat”的每个值。这些行将包含该级别“猫”的 x1、x2 和 x3 的平均值。
cat <- rep(1:10, times = 2)
x1 <- rnorm(20)
x2 <- rnorm(20)
x3 <- rnorm(20)
dat <- cbind(cat, x1, x2, x3)
dat <- as.data.frame(dat)
我不确定这个解决方案是否适合,因为你没有提供 minimal reproducible example,但也许是这样的?
library(data.table)
df <- data.frame(X1 = rep(1:50, each = 2),
X2 = rep(x = 1:2, times = 50),
X3 = rep(x = 1:2, times = 50),
X4 = rep(x = 1:2, times = 50),
X5 = rep(x = 1:2, times = 50),
X6 = rep(x = 1:2, times = 50),
X7 = rep(x = 1:2, times = 50),
X8 = rep(x = 1:2, times = 50),
X9 = rep(x = 1:2, times = 50),
X10 = rep(x = 1:2, times = 50)
)
setDT(df)
head(df)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1 1 1 1 1 1 1 1 1
#> 2: 1 2 2 2 2 2 2 2 2 2
#> 3: 2 1 1 1 1 1 1 1 1 1
#> 4: 2 2 2 2 2 2 2 2 2 2
#> 5: 3 1 1 1 1 1 1 1 1 1
#> 6: 3 2 2 2 2 2 2 2 2 2
df2 <- df[ ,lapply(.SD, mean), by = X1, .SDcols = X2:X10]
head(df2)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2: 2 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3: 3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4: 4 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5: 5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6: 6 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
由 reprex package (v2.0.0)
于 2021-07-16 创建--
或者这个?
library(data.table)
df <- data.frame(X1 = 1:100,
X2 = rep(x = 1:2, times = 50),
X3 = rep(x = 1:2, times = 50),
X4 = rep(x = 1:2, times = 50),
X5 = rep(x = 1:2, times = 50),
X6 = rep(x = 1:2, times = 50),
X7 = rep(x = 1:2, times = 50),
X8 = rep(x = 1:2, times = 50),
X9 = rep(x = 1:2, times = 50),
X10 = rep(x = 1:2, times = 50)
)
setDT(df)
head(df)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1 1 1 1 1 1 1 1 1
#> 2: 2 2 2 2 2 2 2 2 2 2
#> 3: 3 1 1 1 1 1 1 1 1 1
#> 4: 4 2 2 2 2 2 2 2 2 2
#> 5: 5 1 1 1 1 1 1 1 1 1
#> 6: 6 2 2 2 2 2 2 2 2 2
df2 <- df[, lapply(.SD, mean, na.rm=TRUE), X1-0:1]
head(df2)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#> 1: 1 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 2: 3 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 3: 5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 4: 7 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 5: 9 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
#> 6: 11 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5
由 reprex package (v2.0.0)
于 2021-07-16 创建