如何计算 R 中的概率

Question

你好，我正在做一个统计数据 class，我们得到了一个数据集 "NHANES"，我们过滤了它以获得成年吸烟者 --> "NHANES_adult"。

library(NHANES)
# create a NHANES dataset without duplicated IDs 
NHANES <-
  NHANES %>%
  distinct(ID, .keep_all = TRUE) 

NHANES_adult <- NHANES %>%
  filter(Age >= 18) %>%  # only include individuals 18 or older
  filter(SmokeNow != 'NA')  # drop any observations with NA for SmokeNow

我的教授问了以下问题：

1b。现在让我们从 NHANES_adult 数据框中获取 100 个人的单个样本，并计算吸烟者的比例，将其保存到名为 p_smokers.

的变量中

set.seed(12345)  # PROVIDED CODE - this will cause it to create the same
                 # random sample each time

sample_size = 100 # size of each sample

p_smokers <- NHANES_adult %>%
  sample(sample_size) %>%  # take a sample from the data frame [I think this is okay]
  ____(____ = ____(____)) %>% # compute the probability of smoking [This is the point at which I'm struggling to understand what one-line function fits these blank parameters.
  ____()  # extract the variable from the data frame [I believe this is the mutate() function?]

p_smokers

Answer 1

也许这就是您要找的。看来您应该使用 sample_n() 而不是 sample()。要计算一行中的比例，您可以使用 mean()。

sample_size <- 100

NHANES_adult %>%
  sample_n(sample_size) %>%  
  summarize(p_smok = mean(SmokeNow == "Yes")) %>% 
  pull(p_smok)

如何计算 R 中的概率

How to calculate probability in R

statistics

r

probability

dplyr

tidyverse