如何按多列汇总 "long" 格式的数据框?
How can I summarize a data frame in a "long" format by multiple columns?
我的数据框组织如下:
Variable 1 | Variable 2 | Variable 3 | Outcome Variable
---------- | ---------- | ---------- | ----------------
Factor | Factor | Factor | Outcome
几千行、15 个变量列和 1 个输出列。我想用以下长格式总结 table(最好使用 plyr):
Variable 1 | Variable 2 | Variable 3 | Outcome Variable
---------- | ---------- | ---------- | ----------------
Factor 1 | Factor 1 | Factor 1 | Average Outcome
Factor 1 | Factor 1 | Factor 2 | Average Outcome
Factor 1 | Factor 2 | Factor 1 | Average Outcome
Factor 1 | Factor 2 | Factor 2 | Average Outcome
针对不同的变量组合。最简单的方法是什么?
我们可以使用dplyr
library(dplyr)
df1 %>%
group_by(variable1, variable2, variable3) %>%
summarise(OutcomeVariable = mean(OutcomeVariable))
或者用base R
aggregate(OutcomeVariable ~., df1, FUN = mean)
我的数据框组织如下:
Variable 1 | Variable 2 | Variable 3 | Outcome Variable
---------- | ---------- | ---------- | ----------------
Factor | Factor | Factor | Outcome
几千行、15 个变量列和 1 个输出列。我想用以下长格式总结 table(最好使用 plyr):
Variable 1 | Variable 2 | Variable 3 | Outcome Variable
---------- | ---------- | ---------- | ----------------
Factor 1 | Factor 1 | Factor 1 | Average Outcome
Factor 1 | Factor 1 | Factor 2 | Average Outcome
Factor 1 | Factor 2 | Factor 1 | Average Outcome
Factor 1 | Factor 2 | Factor 2 | Average Outcome
针对不同的变量组合。最简单的方法是什么?
我们可以使用dplyr
library(dplyr)
df1 %>%
group_by(variable1, variable2, variable3) %>%
summarise(OutcomeVariable = mean(OutcomeVariable))
或者用base R
aggregate(OutcomeVariable ~., df1, FUN = mean)