错误地使用 ddply 在 R 中汇总数据
Incorrect use of ddply to summarise data in R
我在执行相当简单的 ddply 操作时遇到问题:我有以下数据框。
+----------+----------+
| Expenses | Category |
+----------+----------+
| 735 | 1 |
| 992 | 2 |
| 943 | 1 |
| 995 | 3 |
| 914 | 3 |
| 935 | 1 |
| 956 | 3 |
| 946 | 2 |
| 978 | 1 |
| 924 | 1 |
+----------+----------+
我正在尝试计算每个类别的费用的 N 和均值,方法是执行以下操作:
ddply(df, .(Category), summarise, N = length(df$Expenses), mean = mean(df$Expenses))
但是我得到:
Category N mean
1 1 10 931.8
2 2 10 931.8
3 3 10 931.8
你能帮我弄清楚我做错了什么吗?
这是 df 的 dput
:
structure(list(Expenses = c(735, 992, 943, 995, 914, 935, 956,
946, 978, 924), Category = c(1L, 2L, 1L, 3L, 3L, 1L, 3L, 2L,
1L, 1L)), .Names = c("Expenses", "Category"), class = "data.frame", row.names = c(NA,
-10L))
dplyr
的替代方法:
library(dplyr);
grouped_df <- group_by(df, Category);
summarized_df <- summarize(grouped_df, N = n(),
mean = mean(Expenses));
summarized_df;
我在执行相当简单的 ddply 操作时遇到问题:我有以下数据框。
+----------+----------+
| Expenses | Category |
+----------+----------+
| 735 | 1 |
| 992 | 2 |
| 943 | 1 |
| 995 | 3 |
| 914 | 3 |
| 935 | 1 |
| 956 | 3 |
| 946 | 2 |
| 978 | 1 |
| 924 | 1 |
+----------+----------+
我正在尝试计算每个类别的费用的 N 和均值,方法是执行以下操作:
ddply(df, .(Category), summarise, N = length(df$Expenses), mean = mean(df$Expenses))
但是我得到:
Category N mean
1 1 10 931.8
2 2 10 931.8
3 3 10 931.8
你能帮我弄清楚我做错了什么吗?
这是 df 的 dput
:
structure(list(Expenses = c(735, 992, 943, 995, 914, 935, 956,
946, 978, 924), Category = c(1L, 2L, 1L, 3L, 3L, 1L, 3L, 2L,
1L, 1L)), .Names = c("Expenses", "Category"), class = "data.frame", row.names = c(NA,
-10L))
dplyr
的替代方法:
library(dplyr);
grouped_df <- group_by(df, Category);
summarized_df <- summarize(grouped_df, N = n(),
mean = mean(Expenses));
summarized_df;