使用 R 计算美国不同州的 HIV 病例百分比

Question

我有一个数据集，其中包含 2 年期间美国四个州的 HIV 病例的绝对数量。

在我的数据集中，有三列 date (Jan 2018, Feb 2018 ...) , state (CA, NY, FL, MA) 和 abs_cases .我忽略了人口随时间的变化。我现在想使用人口计算每个州的相对病例数。以下是一些示例人口数据

pop<- c("CA"= 11111, "NY"= 22222, "FL"= 33333,"MA"= 444444).

我已经尝试使用

df%>%
group_by(state)%>%
summarize(rel_cases= state/pop)

但它用不同的人口将每个州划分多次。我怎样才能仅将 FL 中的那些值除以佛罗里达州的人口等等？

Answer 1

没有示例数据集，以下是我对您尝试执行的操作的猜测。您可以使用 enframe 将 pop 转换为数据帧，并通过 state 将其加入 df。然后，您可以计算每个州每个月每个人口的病例数。

library(tidyverse)

pop <- c("CA"= 11111, "NY"= 22222, "FL"= 33333,"MA"= 444444)
pop <- enframe(pop, "state", "pop_num")

df %>%
  left_join(pop, by = "state") %>%
  mutate(rel_cases = abs_cases/pop_num)

Answer 2

你只需要预先合并人口数据和病例数据。

library(dplyr)

# Case data
df = data.frame(
  state = c("A","B","C"),
  time = c(1,2,3),
  cases = rnorm(n = 9, mean = 100, sd = 50)
)

# Population data
pop = data.frame(
  state = c("A","B","C"),
  population = c(1000, 2000, 1500)
)

# We can use left_join to merge the two
df %>% 
  left_join(pop)
#> Joining, by = "state"
#>   state time     cases population
#> 1     A    1 120.58345       1000
#> 2     B    2 142.00035       2000
#> 3     C    3  94.35658       1500
#> 4     A    1  86.91845       1000
#> 5     B    2 222.63554       2000
#> 6     C    3 107.99530       1500
#> 7     A    1 144.48939       1000
#> 8     B    2 178.82640       2000
#> 9     C    3 149.46918       1500

# Finally make our summary
df %>% 
  left_join(pop) %>% 
  group_by(state, time) %>% 
  summarise(rel_cases = cases / population)
#> Joining, by = "state"
#> `summarise()` has grouped output by 'state', 'time'. You can override using the `.groups` argument.
#> # A tibble: 9 x 3
#> # Groups:   state, time [3]
#>   state  time rel_cases
#>   <chr> <dbl>     <dbl>
#> 1 A         1    0.121 
#> 2 A         1    0.0869
#> 3 A         1    0.144 
#> 4 B         2    0.0710
#> 5 B         2    0.111 
#> 6 B         2    0.0894
#> 7 C         3    0.0629
#> 8 C         3    0.0720
#> 9 C         3    0.0996

^{由 reprex package (v0.3.0)}

创建于 2021-06-15

使用 R 计算美国不同州的 HIV 病例百分比

Calculate percentage of HIV cases in different US States using R

r

data-manipulation

dplyr

data-wrangling