cur_group_id 按大小而不是字母顺序

Question

我有以下数据框：

df <- structure(list(s_do_h_patients_state = c("NC", "NC", NA, NA, 
"MN", "MN", "UT", "UT", "IL", "IL"), diabetes = c(FALSE, TRUE, 
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n = c(24191L, 
5684L, 24386L, 3820L, 18768L, 2423L, 19732L, 1313L, 15670L, 2336L
), p = c(0.809740585774059, 0.190259414225941, 0.864567822449124, 
0.135432177550876, 0.88565900618187, 0.11434099381813, 0.937609883582799, 
0.0623901164172012, 0.870265467066533, 0.129734532933467), N = c(29875L, 
29875L, 28206L, 28206L, 21191L, 21191L, 21045L, 21045L, 18006L, 
18006L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

我想添加另一列枚举组，这样输出将是 c(1,1,2,2,3,3...)。

一种方法是 group_indices，但它按字母顺序而不是按组大小排序。实现此目标的正确方法是什么？

Answer 1

这里有一个解决这个问题的简单方法

library(dplyr)
df %>% mutate(group = match(s_do_h_patients_state, unique(s_do_h_patients_state)))

输出

# A tibble: 10 x 6
   s_do_h_patients_state diabetes     n      p     N group
   <chr>                 <lgl>    <int>  <dbl> <int> <int>
 1 NC                    FALSE    24191 0.810  29875     1
 2 NC                    TRUE      5684 0.190  29875     1
 3 NA                    FALSE    24386 0.865  28206     2
 4 NA                    TRUE      3820 0.135  28206     2
 5 MN                    FALSE    18768 0.886  21191     3
 6 MN                    TRUE      2423 0.114  21191     3
 7 UT                    FALSE    19732 0.938  21045     4
 8 UT                    TRUE      1313 0.0624 21045     4
 9 IL                    FALSE    15670 0.870  18006     5
10 IL                    TRUE      2336 0.130  18006     5

请注意，您不能使用 rleid，因为

> data.table::rleid(c("NC", "NC", "IL", "NC"))
[1] 1 1 2 3

Answer 2

df %>% arrange(desc(N)) %>%
mutate(id = dense_rank(desc(N)))

cur_group_id 按大小而不是字母顺序

cur_group_id by size rather than alphabetical order

grouping

r

dplyr