如何计算 R 中特定观察值的两个变量之间的百分比？

Question

我正在尝试计算一个二进制变量的 incidence/percentage 与包含 5（+ 一个 NA）不同收入等级的变量的关系。我正在使用：

afghan %>% group_by(income)  %>% 
  summarize(violent.exp.ISAF = n()) %>%
  mutate(Percentage = violent.exp.ISAF/sum(violent.exp.ISAF)*100)

但这给出了二元变量相对于整个 table 的一般百分比，而不仅仅是在特定收入范围内，如下所示：

# income          violent.exp.taliban Percentage
#  <chr>                         <int>      <dbl>
#1 10,001-20,000                   616     22.4  
#2 2,001-10,000                   1420     51.6  
#3 20,001-30,000                    93      3.38 
#4 less than 2,000                 457     16.6  
#5 over 30,000                      14      0.508
#6 NA                              154      5.59

我希望二元变量的百分比正好在特定收入范围内。有什么建议吗？

阿富汗数据集样本：

> dput(head(afghan))
structure(list(province = c("Logar", "Logar", "Logar", "Logar", 
"Logar", "Logar"), district = c("Baraki Barak", "Baraki Barak", 
"Baraki Barak", "Baraki Barak", "Baraki Barak", "Baraki Barak"
), village.id = c(80, 80, 80, 80, 80, 80), age = c(26, 49, 60, 
34, 21, 18), educ.years = c(10, 3, 0, 14, 12, 10), employed = c(0, 
1, 1, 1, 1, 1), income = c("2,001-10,000", "2,001-10,000", "2,001-10,000", 
"2,001-10,000", "2,001-10,000", NA), violent.exp.ISAF = c(0, 
0, 1, 0, 0, 0), violent.exp.taliban = c(0, 0, 0, 0, 0, 0), list.group = c("control", 
"control", "control", "ISAF", "ISAF", "ISAF"), list.response = c(0, 
1, 1, 3, 3, 2)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 1

使用 dplyr/tidyverse 和 janitor，您可以：

library(tidyverse)
library(janitor)

afghan %>% 
  group_by(income) %>% 
  tabyl(income, violent.exp.ISAF) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting()

这显示了您在 income 中的百分比分布：

       income      0     1
 2,001-10,000  80.0% 20.0%
         <NA> 100.0%  0.0%

创建 tibble:

afghan_tibble <- afghan %>% 
  group_by(income) %>% 
  tabyl(income, violent.exp.ISAF) %>% 
  adorn_percentages() %>% 
  adorn_pct_formatting() %>% 
  as_tibble()

如何计算 R 中特定观察值的两个变量之间的百分比？

How to calculate the percentage between two variables for specific observations in R?

group-by

r

percentage

dplyr

summarize