如何计算 R Studio 中数据集的平均值?
How do I calculate the mean for the data set in R Studio?
下面是我的数据集。
要找到整列(列名:BP)的平均值,我使用以下 R 代码实现了它
library(Sleuth3)
ex0112
View(ex0112)
mean(ex0112$BP)
但是我如何计算常规油饮食的平均值(BP)?
我是 R 编程的新手。非常感谢您的帮助
也许你可以试试下面的代码
with(ex0112,tapply(BP,Diet,mean))
如果您只对RegularOil
感兴趣
with(ex0112,tapply(BP,Diet,mean))["RegularOil"]
另一个选项dplyr
library(dplyr)
ex0112 %>%
group_by(Diet) %>%
summarise(BP = mean(BP))
我建议采用 base R
方法。用一些数据做个例子,可以求出BP
的均值。 aggregate()
函数允许数据的子集。
#Data
df <- structure(list(Diet = c("FishOil", "RegularOil", "FishOil", "RegularOil",
"FishOil", "RegularOil", "FishOil", "RegularOil", "FishOil",
"RegularOil", "FishOil", "RegularOil", "FishOil", "RegularOil"
), BP = c(0, 10, 2, 12, 13, -5, 5, 12, 5, 3, 13, 3, 8, 5)), class = "data.frame", row.names = c(NA,
-14L))
代码:
#Aggregate
aggregate(BP~Diet,mean,data=df,subset = Diet=='RegularOil')
输出:
Diet BP
1 RegularOil 5.714286
您可以使用来自@Duck
的数据通过 data.table 尝试此操作
library(data.table)
setDT(df)[, .(meanBP = mean(BP)), by = .(Diet)]
# Diet meanBP
# 1: FishOil 6.571429
# 2: RegularOil 5.714286
我会提议:
ex0112.dt <- as.data.table(ex0112)
ex0112.dt[,mean(BP),by=.(Diet)]
我最近运行在几个优化问题中,data.table真的救了我。
话虽这么说,对其他解决方案进行基准测试,tapply 似乎是赢家:)
(bench = microbenchmark::microbenchmark(
with(ex0112, tapply(BP,Diet,mean)),
ex0112 %>%
group_by(Diet) %>%
summarise(BP = mean(BP), .groups = "drop_last"),
aggregate(BP ~ Diet, ex0112, mean),
ex0112.dt[,mean(BP),by=.(Diet)],
times=1000L
))
ggplot2::autoplot(bench)
下面是我的数据集。
要找到整列(列名:BP)的平均值,我使用以下 R 代码实现了它
library(Sleuth3)
ex0112
View(ex0112)
mean(ex0112$BP)
但是我如何计算常规油饮食的平均值(BP)? 我是 R 编程的新手。非常感谢您的帮助
也许你可以试试下面的代码
with(ex0112,tapply(BP,Diet,mean))
如果您只对RegularOil
感兴趣
with(ex0112,tapply(BP,Diet,mean))["RegularOil"]
另一个选项dplyr
library(dplyr)
ex0112 %>%
group_by(Diet) %>%
summarise(BP = mean(BP))
我建议采用 base R
方法。用一些数据做个例子,可以求出BP
的均值。 aggregate()
函数允许数据的子集。
#Data
df <- structure(list(Diet = c("FishOil", "RegularOil", "FishOil", "RegularOil",
"FishOil", "RegularOil", "FishOil", "RegularOil", "FishOil",
"RegularOil", "FishOil", "RegularOil", "FishOil", "RegularOil"
), BP = c(0, 10, 2, 12, 13, -5, 5, 12, 5, 3, 13, 3, 8, 5)), class = "data.frame", row.names = c(NA,
-14L))
代码:
#Aggregate
aggregate(BP~Diet,mean,data=df,subset = Diet=='RegularOil')
输出:
Diet BP
1 RegularOil 5.714286
您可以使用来自@Duck
的数据通过 data.table 尝试此操作library(data.table)
setDT(df)[, .(meanBP = mean(BP)), by = .(Diet)]
# Diet meanBP
# 1: FishOil 6.571429
# 2: RegularOil 5.714286
我会提议:
ex0112.dt <- as.data.table(ex0112)
ex0112.dt[,mean(BP),by=.(Diet)]
我最近运行在几个优化问题中,data.table真的救了我。 话虽这么说,对其他解决方案进行基准测试,tapply 似乎是赢家:)
(bench = microbenchmark::microbenchmark(
with(ex0112, tapply(BP,Diet,mean)),
ex0112 %>%
group_by(Diet) %>%
summarise(BP = mean(BP), .groups = "drop_last"),
aggregate(BP ~ Diet, ex0112, mean),
ex0112.dt[,mean(BP),by=.(Diet)],
times=1000L
))
ggplot2::autoplot(bench)