R按组总和对ddply进行排序
R sort summarise ddply by group sum
我有一个data.frame这样的
x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))
Category City Frequency
1 One D 10
2 One A 1
3 Four B 5
4 Two B 2
5 Two A 14
6 Three D 8
7 Two A 20
8 Four C 3
9 Three C 5
我想用 sum(Frequency) 创建一个枢轴 table 并像这样使用 ddply 函数:
ddply(x,.(Category,City),summarize,Total=sum(Frequency))
Category City Total
1 Four B 5
2 Four C 3
3 One A 1
4 One D 10
5 Three C 5
6 Three D 8
7 Two A 34
8 Two B 2
但我需要此结果按每个类别组中的总数排序。像这样:
Category City Frequency
1 Two A 34
2 Two B 2
3 Three D 14
4 Three C 5
5 One D 10
6 One A 1
7 Four B 5
8 Four C 3
我查看并尝试了排序、排序、排列,但似乎没有任何东西可以满足我的需要。我怎样才能在 R 中做到这一点?
这是一个很好的问题,除了创建 总大小 索引然后按它排序之外,我想不出一个直接的方法。这是一种可能的 data.table
方法,它使用 setorder
函数,该函数将按参考
对数据进行排序
library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
# Category City Total size
# 1: Two A 34 36
# 2: Two B 2 36
# 3: Three D 8 13
# 4: Three C 5 13
# 5: One D 10 11
# 6: One A 1 11
# 7: Four B 5 8
# 8: Four C 3 8
或者,如果您深入了解 Hdleyverse,我们可以使用较新的 dplyr
包(如@akrun 所建议的那样)达到类似的结果
library(dplyr)
x %>%
group_by(Category, City) %>%
summarise(Total = sum(Frequency)) %>%
mutate(size= sum(Total)) %>%
ungroup %>%
arrange(-size, -Total, Category)
这是一个基础 R 版本,其中 DF
是您 ddply
调用的结果:
with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])
产生:
Category City Total
7 Two A 34
8 Two B 2
6 Three D 8
5 Three C 5
4 One D 10
3 One A 1
1 Four B 5
2 Four C 3
逻辑与大卫的逻辑基本相同,计算每个 Category
的 Total
的总和,对每个 Category
中的所有行使用该数字(我们用 ave(..., FUN=sum)
),然后按那个加上一些决胜局来确保结果按预期出现。
我有一个data.frame这样的
x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))
Category City Frequency
1 One D 10
2 One A 1
3 Four B 5
4 Two B 2
5 Two A 14
6 Three D 8
7 Two A 20
8 Four C 3
9 Three C 5
我想用 sum(Frequency) 创建一个枢轴 table 并像这样使用 ddply 函数:
ddply(x,.(Category,City),summarize,Total=sum(Frequency))
Category City Total
1 Four B 5
2 Four C 3
3 One A 1
4 One D 10
5 Three C 5
6 Three D 8
7 Two A 34
8 Two B 2
但我需要此结果按每个类别组中的总数排序。像这样:
Category City Frequency
1 Two A 34
2 Two B 2
3 Three D 14
4 Three C 5
5 One D 10
6 One A 1
7 Four B 5
8 Four C 3
我查看并尝试了排序、排序、排列,但似乎没有任何东西可以满足我的需要。我怎样才能在 R 中做到这一点?
这是一个很好的问题,除了创建 总大小 索引然后按它排序之外,我想不出一个直接的方法。这是一种可能的 data.table
方法,它使用 setorder
函数,该函数将按参考
library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
# Category City Total size
# 1: Two A 34 36
# 2: Two B 2 36
# 3: Three D 8 13
# 4: Three C 5 13
# 5: One D 10 11
# 6: One A 1 11
# 7: Four B 5 8
# 8: Four C 3 8
或者,如果您深入了解 Hdleyverse,我们可以使用较新的 dplyr
包(如@akrun 所建议的那样)达到类似的结果
library(dplyr)
x %>%
group_by(Category, City) %>%
summarise(Total = sum(Frequency)) %>%
mutate(size= sum(Total)) %>%
ungroup %>%
arrange(-size, -Total, Category)
这是一个基础 R 版本,其中 DF
是您 ddply
调用的结果:
with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])
产生:
Category City Total
7 Two A 34
8 Two B 2
6 Three D 8
5 Three C 5
4 One D 10
3 One A 1
1 Four B 5
2 Four C 3
逻辑与大卫的逻辑基本相同,计算每个 Category
的 Total
的总和,对每个 Category
中的所有行使用该数字(我们用 ave(..., FUN=sum)
),然后按那个加上一些决胜局来确保结果按预期出现。