R按组总和对ddply进行排序

R sort summarise ddply by group sum

我有一个data.frame这样的

x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))

  Category City Frequency
1      One    D        10
2      One    A         1
3     Four    B         5
4      Two    B         2
5      Two    A        14
6    Three    D         8
7      Two    A        20
8     Four    C         3
9    Three    C         5

我想用 sum(Frequency) 创建一个枢轴 table 并像这样使用 ddply 函数:

ddply(x,.(Category,City),summarize,Total=sum(Frequency))
  Category City Total
1     Four    B     5
2     Four    C     3
3      One    A     1
4      One    D    10
5    Three    C     5
6    Three    D     8
7      Two    A    34
8      Two    B     2

但我需要此结果按每个类别组中的总数排序。像这样:

Category City Frequency
1      Two    A        34
2      Two    B         2
3    Three    D        14
4    Three    C         5
5      One    D        10
6      One    A         1
7     Four    B         5
8     Four    C         3

我查看并尝试了排序、排序、排列,但似乎没有任何东西可以满足我的需要。我怎样才能在 R 中做到这一点?

这是一个很好的问题,除了创建 总大小 索引然后按它排序之外,我想不出一个直接的方法。这是一种可能的 data.table 方法,它使用 setorder 函数,该函数将按参考

对数据进行排序
library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
#    Category City Total size
# 1:      Two    A    34   36
# 2:      Two    B     2   36
# 3:    Three    D     8   13
# 4:    Three    C     5   13
# 5:      One    D    10   11
# 6:      One    A     1   11
# 7:     Four    B     5    8
# 8:     Four    C     3    8

或者,如果您深入了解 Hdleyverse,我们可以使用较新的 dplyr 包(如@akrun 所建议的那样)达到类似的结果

library(dplyr)
x %>% 
  group_by(Category, City) %>% 
  summarise(Total = sum(Frequency)) %>% 
  mutate(size= sum(Total)) %>% 
  ungroup %>%
  arrange(-size, -Total, Category)

这是一个基础 R 版本,其中 DF 是您 ddply 调用的结果:

with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])

产生:

  Category City Total
7      Two    A    34
8      Two    B     2
6    Three    D     8
5    Three    C     5
4      One    D    10
3      One    A     1
1     Four    B     5
2     Four    C     3

逻辑与大卫的逻辑基本相同,计算每个 CategoryTotal 的总和,对每个 Category 中的所有行使用该数字(我们用 ave(..., FUN=sum)),然后按那个加上一些决胜局来确保结果按预期出现。