cur_group_id 按大小而不是字母顺序
cur_group_id by size rather than alphabetical order
我有以下数据框:
df <- structure(list(s_do_h_patients_state = c("NC", "NC", NA, NA,
"MN", "MN", "UT", "UT", "IL", "IL"), diabetes = c(FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n = c(24191L,
5684L, 24386L, 3820L, 18768L, 2423L, 19732L, 1313L, 15670L, 2336L
), p = c(0.809740585774059, 0.190259414225941, 0.864567822449124,
0.135432177550876, 0.88565900618187, 0.11434099381813, 0.937609883582799,
0.0623901164172012, 0.870265467066533, 0.129734532933467), N = c(29875L,
29875L, 28206L, 28206L, 21191L, 21191L, 21045L, 21045L, 18006L,
18006L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
我想添加另一列枚举组,这样输出将是 c(1,1,2,2,3,3...)
。
一种方法是 group_indices
,但它按字母顺序而不是按组大小排序。
实现此目标的正确方法是什么?
这里有一个解决这个问题的简单方法
library(dplyr)
df %>% mutate(group = match(s_do_h_patients_state, unique(s_do_h_patients_state)))
输出
# A tibble: 10 x 6
s_do_h_patients_state diabetes n p N group
<chr> <lgl> <int> <dbl> <int> <int>
1 NC FALSE 24191 0.810 29875 1
2 NC TRUE 5684 0.190 29875 1
3 NA FALSE 24386 0.865 28206 2
4 NA TRUE 3820 0.135 28206 2
5 MN FALSE 18768 0.886 21191 3
6 MN TRUE 2423 0.114 21191 3
7 UT FALSE 19732 0.938 21045 4
8 UT TRUE 1313 0.0624 21045 4
9 IL FALSE 15670 0.870 18006 5
10 IL TRUE 2336 0.130 18006 5
请注意,您不能使用 rleid
,因为
> data.table::rleid(c("NC", "NC", "IL", "NC"))
[1] 1 1 2 3
df %>% arrange(desc(N)) %>%
mutate(id = dense_rank(desc(N)))
我有以下数据框:
df <- structure(list(s_do_h_patients_state = c("NC", "NC", NA, NA,
"MN", "MN", "UT", "UT", "IL", "IL"), diabetes = c(FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n = c(24191L,
5684L, 24386L, 3820L, 18768L, 2423L, 19732L, 1313L, 15670L, 2336L
), p = c(0.809740585774059, 0.190259414225941, 0.864567822449124,
0.135432177550876, 0.88565900618187, 0.11434099381813, 0.937609883582799,
0.0623901164172012, 0.870265467066533, 0.129734532933467), N = c(29875L,
29875L, 28206L, 28206L, 21191L, 21191L, 21045L, 21045L, 18006L,
18006L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
我想添加另一列枚举组,这样输出将是 c(1,1,2,2,3,3...)
。
一种方法是 group_indices
,但它按字母顺序而不是按组大小排序。
实现此目标的正确方法是什么?
这里有一个解决这个问题的简单方法
library(dplyr)
df %>% mutate(group = match(s_do_h_patients_state, unique(s_do_h_patients_state)))
输出
# A tibble: 10 x 6
s_do_h_patients_state diabetes n p N group
<chr> <lgl> <int> <dbl> <int> <int>
1 NC FALSE 24191 0.810 29875 1
2 NC TRUE 5684 0.190 29875 1
3 NA FALSE 24386 0.865 28206 2
4 NA TRUE 3820 0.135 28206 2
5 MN FALSE 18768 0.886 21191 3
6 MN TRUE 2423 0.114 21191 3
7 UT FALSE 19732 0.938 21045 4
8 UT TRUE 1313 0.0624 21045 4
9 IL FALSE 15670 0.870 18006 5
10 IL TRUE 2336 0.130 18006 5
请注意,您不能使用 rleid
,因为
> data.table::rleid(c("NC", "NC", "IL", "NC"))
[1] 1 1 2 3
df %>% arrange(desc(N)) %>%
mutate(id = dense_rank(desc(N)))