按间隔和条件对数据进行分组

Question

我有以下问题：我希望能够将收入（随机 1-100）添加到间隔中，并按性别对它们进行分组（显示每个性别在每个间隔中有多少个案例），另外我想知道比例和百分比：

ingresos <- sample (0:100, 30, replace = T)
sexo <- sample (1:2, 30, replace = T)


base<-tibble(Ingresos=ingresos<-case_when(
    ingresos>=0 & ingresos<20 ~ "(0, 19]",
    ingresos>=20 & ingresos<50 ~ "(20, 49]",
    ingresos>=50 & ingresos<70 ~ "(50, 69]",
    ingresos>=70 ~ "(70 ó +)"
  ) , Sexo=sexo, Proporción=ingresos/sum(ingresos), Porcentaje=Proporción*100)

我最终得到：

> show(base)
# A tibble: 30 x 4
   Ingresos  Sexo Proporción Porcentaje
   <chr>    <int>      <dbl>      <dbl>
 1 (0, 19]      2    0.00583      0.583
 2 (50, 69]     1    0.0343       3.43 
 3 (20, 49]     2    0.0233       2.33 
 4 (20, 49]     1    0.0188       1.88 
 5 (20, 49]     2    0.0311       3.11 
 6 (50, 69]     2    0.0369       3.69 
 7 (20, 49]     1    0.0278       2.78 
 8 (20, 49]     1    0.0142       1.42 
 9 (70 ó +)     1    0.0628       6.28 
10 (20, 49]     1    0.0130       1.30 
# … with 20 more rows

我正在寻找类似的东西：

Ingresos Sexo Cases Proporción Porcentaje
(0,19]     1    12     .xxx       x.xxx
(0,19]     2    20     .xxx       x.xxx
(20,49]    1    17     .xxx       x.xxx
(20,49]    2    30     .xxx       x.xxx

Answer 1

可以使用 cut() 将 ingresos 向量分割成范围。频率可以用 dplyr::count() 导出。比例和百分比可以用dplyr::mutate()相加。像这样：

ingresos <- sample(0:100, 30, replace = T)
sexo <- sample(1:2, 30, replace = T)

library(dplyr)

tibble(ingresos, sexo) %>% 
  mutate(ingresos = cut(ingresos, c(0, 20, 50, 70, 100))) %>% 
  count(ingresos, sexo) %>% 
  mutate(Proporción=n/sum(n), Porcentaje=Proporción*100)
#> # A tibble: 8 x 5
#>   ingresos  sexo     n Proporción Porcentaje
#>   <fct>    <int> <int>      <dbl>      <dbl>
#> 1 (0,20]       1     3     0.1         10   
#> 2 (0,20]       2     4     0.133       13.3 
#> 3 (20,50]      1     2     0.0667       6.67
#> 4 (20,50]      2     5     0.167       16.7 
#> 5 (50,70]      1     3     0.1         10   
#> 6 (50,70]      2     1     0.0333       3.33
#> 7 (70,100]     1     4     0.133       13.3 
#> 8 (70,100]     2     8     0.267       26.7

按间隔和条件对数据进行分组

Group data by intervals and condition

group-by

r

intervals

tibble