dplyr:有条件地根据另一列的条件对列进行排名?
dplyr: conditionally rank a column based on a condition of another?
嗨,假设我有一个这样的 table。我想要的是按“百分比”排名,但我只希望它基于猫列何时为“高”组并忽略“低”。
name cat Freq percent
1 berry HIGH 259 0.583
2 jack HIGH 45 0.634
3 steve HIGH 331 0.943
4 nadia HIGH 304 0.580
5 jacob HIGH 179 0.844
6 susan HIGH 15 0.833
7 luthered HIGH 14 0.264
8 jane HIGH 99 0.513
9 berry LOW 185 0.417
10 jack LOW 26 0.366
11 steve LOW 20 0.057
12 nadia LOW 220 0.420
13 jacob LOW 33 0.156
14 susan LOW 3 0.167
15 luthered LOW 39 0.736
16 jane LOW 94 0.487
我试过这样做,但似乎无法使条件正常工作。
temp = structure(list(name = c("berry", "jack", "steve", "nadia", "jacob",
"susan", "luthered", "jane", "berry", "jack", "steve", "nadia",
"jacob", "susan", "luthered", "jane"), cat = c("HIGH", "HIGH",
"HIGH", "HIGH", "HIGH", "HIGH", "HIGH", "HIGH", "LOW", "LOW",
"LOW", "LOW", "LOW", "LOW", "LOW", "LOW"), Freq = c(259L, 45L,
331L, 304L, 179L, 15L, 14L, 99L, 185L, 26L, 20L, 220L, 33L, 3L,
39L, 94L), percent = c(0.583, 0.634, 0.943, 0.58, 0.844, 0.833,
0.264, 0.513, 0.417, 0.366, 0.057, 0.42, 0.156, 0.167, 0.736,
0.487)), class = "data.frame", row.names = c(NA, -16L))
我试过这样做,但顺序不正确。
temp %>% arrange(desc ( percent), cat =="HIGH" )
如果排序正确,名称应该这样排序:
史蒂夫
雅各布
苏珊
杰克
浆果
纳迪亚
简
路德
提前致谢。
我们可以使用
temp %>%
arrange(replace(rep(n() + 1, n()), cat == "HIGH",
dense_rank(-percent[cat == "HIGH"])))
或者也可以使用
temp %>%
group_by(cat) %>%
group_modify(~ .x %>%
arrange(if(.y$cat == "HIGH") desc(percent) else n() + 1 )) %>%
ungroup
-输出
# A tibble: 16 × 4
cat name Freq percent
<chr> <chr> <int> <dbl>
1 HIGH steve 331 0.943
2 HIGH jacob 179 0.844
3 HIGH susan 15 0.833
4 HIGH jack 45 0.634
5 HIGH berry 259 0.583
6 HIGH nadia 304 0.58
7 HIGH jane 99 0.513
8 HIGH luthered 14 0.264
9 LOW berry 185 0.417
10 LOW jack 26 0.366
11 LOW steve 20 0.057
12 LOW nadia 220 0.42
13 LOW jacob 33 0.156
14 LOW susan 3 0.167
15 LOW luthered 39 0.736
16 LOW jane 94 0.487
或者如果 'cat' 应该根据对应于 'HIGH'
的 'percent' 值排序
temp %>%
arrange(factor(name, levels = unique(name[cat == "HIGH"
][order(dense_rank(-percent[cat == "HIGH"]))])))
-输出
name cat Freq percent
1 steve HIGH 331 0.943
2 steve LOW 20 0.057
3 jacob HIGH 179 0.844
4 jacob LOW 33 0.156
5 susan HIGH 15 0.833
6 susan LOW 3 0.167
7 jack HIGH 45 0.634
8 jack LOW 26 0.366
9 berry HIGH 259 0.583
10 berry LOW 185 0.417
11 nadia HIGH 304 0.580
12 nadia LOW 220 0.420
13 jane HIGH 99 0.513
14 jane LOW 94 0.487
15 luthered HIGH 14 0.264
16 luthered LOW 39 0.736
嗨,假设我有一个这样的 table。我想要的是按“百分比”排名,但我只希望它基于猫列何时为“高”组并忽略“低”。
name cat Freq percent
1 berry HIGH 259 0.583
2 jack HIGH 45 0.634
3 steve HIGH 331 0.943
4 nadia HIGH 304 0.580
5 jacob HIGH 179 0.844
6 susan HIGH 15 0.833
7 luthered HIGH 14 0.264
8 jane HIGH 99 0.513
9 berry LOW 185 0.417
10 jack LOW 26 0.366
11 steve LOW 20 0.057
12 nadia LOW 220 0.420
13 jacob LOW 33 0.156
14 susan LOW 3 0.167
15 luthered LOW 39 0.736
16 jane LOW 94 0.487
我试过这样做,但似乎无法使条件正常工作。
temp = structure(list(name = c("berry", "jack", "steve", "nadia", "jacob",
"susan", "luthered", "jane", "berry", "jack", "steve", "nadia",
"jacob", "susan", "luthered", "jane"), cat = c("HIGH", "HIGH",
"HIGH", "HIGH", "HIGH", "HIGH", "HIGH", "HIGH", "LOW", "LOW",
"LOW", "LOW", "LOW", "LOW", "LOW", "LOW"), Freq = c(259L, 45L,
331L, 304L, 179L, 15L, 14L, 99L, 185L, 26L, 20L, 220L, 33L, 3L,
39L, 94L), percent = c(0.583, 0.634, 0.943, 0.58, 0.844, 0.833,
0.264, 0.513, 0.417, 0.366, 0.057, 0.42, 0.156, 0.167, 0.736,
0.487)), class = "data.frame", row.names = c(NA, -16L))
我试过这样做,但顺序不正确。
temp %>% arrange(desc ( percent), cat =="HIGH" )
如果排序正确,名称应该这样排序: 史蒂夫 雅各布 苏珊 杰克 浆果 纳迪亚 简 路德
提前致谢。
我们可以使用
temp %>%
arrange(replace(rep(n() + 1, n()), cat == "HIGH",
dense_rank(-percent[cat == "HIGH"])))
或者也可以使用
temp %>%
group_by(cat) %>%
group_modify(~ .x %>%
arrange(if(.y$cat == "HIGH") desc(percent) else n() + 1 )) %>%
ungroup
-输出
# A tibble: 16 × 4
cat name Freq percent
<chr> <chr> <int> <dbl>
1 HIGH steve 331 0.943
2 HIGH jacob 179 0.844
3 HIGH susan 15 0.833
4 HIGH jack 45 0.634
5 HIGH berry 259 0.583
6 HIGH nadia 304 0.58
7 HIGH jane 99 0.513
8 HIGH luthered 14 0.264
9 LOW berry 185 0.417
10 LOW jack 26 0.366
11 LOW steve 20 0.057
12 LOW nadia 220 0.42
13 LOW jacob 33 0.156
14 LOW susan 3 0.167
15 LOW luthered 39 0.736
16 LOW jane 94 0.487
或者如果 'cat' 应该根据对应于 'HIGH'
的 'percent' 值排序temp %>%
arrange(factor(name, levels = unique(name[cat == "HIGH"
][order(dense_rank(-percent[cat == "HIGH"]))])))
-输出
name cat Freq percent
1 steve HIGH 331 0.943
2 steve LOW 20 0.057
3 jacob HIGH 179 0.844
4 jacob LOW 33 0.156
5 susan HIGH 15 0.833
6 susan LOW 3 0.167
7 jack HIGH 45 0.634
8 jack LOW 26 0.366
9 berry HIGH 259 0.583
10 berry LOW 185 0.417
11 nadia HIGH 304 0.580
12 nadia LOW 220 0.420
13 jane HIGH 99 0.513
14 jane LOW 94 0.487
15 luthered HIGH 14 0.264
16 luthered LOW 39 0.736