R data.table 按组排序,每组底部有 "other"
R data.table sorting by group with "other" at bottom of each group
我不太明白这个的语法。我有一个 data.table
,我想首先按分组列 g1
(有序因子)排序,然后按另一列 n
降序排序。唯一的问题是我希望第三列 g2
标记为“其他”的行出现在每个组的底部,而不管它们的值 n
.
示例:
library(data.table)
dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
g2 = rep(c('stuff', 'things', 'other'), each = 3),
n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))
这是预期的输出,在每个 g1
中,我们有 n
的降序,除了 g2 == 'other'
始终位于底部的行:
g1 g2 n
1: Australia things 5000
2: Australia stuff 1000
3: Australia other 10000
4: Canada things 3500
5: Canada stuff 3000
6: Canada other 0
7: Mexico stuff 2000
8: Mexico things 100
9: Mexico other 10000
利用 data.table::order
及其 -
-反向排序:
dt[order(g1, g2 == "other", -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia stuff 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada stuff 3000
# 6: Canada other 0
# 7: Mexico stuff 2000
# 8: Mexico things 100
# 9: Mexico other 10000
我们添加 g2 == "other"
是因为您说“其他”应该始终放在最后。例如,如果 "stuff"
是 "abc"
,那么我们可以看到行为上的差异:
dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia other 10000
# 2: Australia things 5000
# 3: Australia abc 1000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico other 10000
# 8: Mexico abc 2000
# 9: Mexico things 100
dt[order(g1, g2 == "other", -g2), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia abc 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico things 100
# 8: Mexico abc 2000
# 9: Mexico other 10000
这样做的一个缺点是 setorder
不能直接工作:
setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) :
# some columns are not in the data.table: ==,other
所以我们需要重新排序并重新分配回 dt
。
顺便说一句:这是有效的,因为 g2 == "other"
解析为 logical
,是的,但是在排序中它们被视为 0
(假)和 1
(真),因此,假条件将出现在真条件之前。
我不太明白这个的语法。我有一个 data.table
,我想首先按分组列 g1
(有序因子)排序,然后按另一列 n
降序排序。唯一的问题是我希望第三列 g2
标记为“其他”的行出现在每个组的底部,而不管它们的值 n
.
示例:
library(data.table)
dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
g2 = rep(c('stuff', 'things', 'other'), each = 3),
n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))
这是预期的输出,在每个 g1
中,我们有 n
的降序,除了 g2 == 'other'
始终位于底部的行:
g1 g2 n
1: Australia things 5000
2: Australia stuff 1000
3: Australia other 10000
4: Canada things 3500
5: Canada stuff 3000
6: Canada other 0
7: Mexico stuff 2000
8: Mexico things 100
9: Mexico other 10000
利用 data.table::order
及其 -
-反向排序:
dt[order(g1, g2 == "other", -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia stuff 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada stuff 3000
# 6: Canada other 0
# 7: Mexico stuff 2000
# 8: Mexico things 100
# 9: Mexico other 10000
我们添加 g2 == "other"
是因为您说“其他”应该始终放在最后。例如,如果 "stuff"
是 "abc"
,那么我们可以看到行为上的差异:
dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia other 10000
# 2: Australia things 5000
# 3: Australia abc 1000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico other 10000
# 8: Mexico abc 2000
# 9: Mexico things 100
dt[order(g1, g2 == "other", -g2), ]
# g1 g2 n
# <fctr> <char> <num>
# 1: Australia things 5000
# 2: Australia abc 1000
# 3: Australia other 10000
# 4: Canada things 3500
# 5: Canada abc 3000
# 6: Canada other 0
# 7: Mexico things 100
# 8: Mexico abc 2000
# 9: Mexico other 10000
这样做的一个缺点是 setorder
不能直接工作:
setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) :
# some columns are not in the data.table: ==,other
所以我们需要重新排序并重新分配回 dt
。
顺便说一句:这是有效的,因为 g2 == "other"
解析为 logical
,是的,但是在排序中它们被视为 0
(假)和 1
(真),因此,假条件将出现在真条件之前。