如何计算 R 中的概率
How to calculate probability in R
你好,我正在做一个统计数据 class,我们得到了一个数据集 "NHANES",我们过滤了它以获得成年吸烟者 --> "NHANES_adult"。
library(NHANES)
# create a NHANES dataset without duplicated IDs
NHANES <-
NHANES %>%
distinct(ID, .keep_all = TRUE)
NHANES_adult <- NHANES %>%
filter(Age >= 18) %>% # only include individuals 18 or older
filter(SmokeNow != 'NA') # drop any observations with NA for SmokeNow
我的教授问了以下问题:
1b。现在让我们从 NHANES_adult 数据框中获取 100 个人的单个样本,并计算吸烟者的比例,将其保存到名为 p_smokers.
的变量中
set.seed(12345) # PROVIDED CODE - this will cause it to create the same
# random sample each time
sample_size = 100 # size of each sample
p_smokers <- NHANES_adult %>%
sample(sample_size) %>% # take a sample from the data frame [I think this is okay]
____(____ = ____(____)) %>% # compute the probability of smoking [This is the point at which I'm struggling to understand what one-line function fits these blank parameters.
____() # extract the variable from the data frame [I believe this is the mutate() function?]
p_smokers
也许这就是您要找的。看来您应该使用 sample_n()
而不是 sample()
。要计算一行中的比例,您可以使用 mean()
。
sample_size <- 100
NHANES_adult %>%
sample_n(sample_size) %>%
summarize(p_smok = mean(SmokeNow == "Yes")) %>%
pull(p_smok)
你好,我正在做一个统计数据 class,我们得到了一个数据集 "NHANES",我们过滤了它以获得成年吸烟者 --> "NHANES_adult"。
library(NHANES)
# create a NHANES dataset without duplicated IDs
NHANES <-
NHANES %>%
distinct(ID, .keep_all = TRUE)
NHANES_adult <- NHANES %>%
filter(Age >= 18) %>% # only include individuals 18 or older
filter(SmokeNow != 'NA') # drop any observations with NA for SmokeNow
我的教授问了以下问题:
1b。现在让我们从 NHANES_adult 数据框中获取 100 个人的单个样本,并计算吸烟者的比例,将其保存到名为 p_smokers.
的变量中set.seed(12345) # PROVIDED CODE - this will cause it to create the same
# random sample each time
sample_size = 100 # size of each sample
p_smokers <- NHANES_adult %>%
sample(sample_size) %>% # take a sample from the data frame [I think this is okay]
____(____ = ____(____)) %>% # compute the probability of smoking [This is the point at which I'm struggling to understand what one-line function fits these blank parameters.
____() # extract the variable from the data frame [I believe this is the mutate() function?]
p_smokers
也许这就是您要找的。看来您应该使用 sample_n()
而不是 sample()
。要计算一行中的比例,您可以使用 mean()
。
sample_size <- 100
NHANES_adult %>%
sample_n(sample_size) %>%
summarize(p_smok = mean(SmokeNow == "Yes")) %>%
pull(p_smok)