R 中的调查包:如何设置 fpc 参数(有限总体校正)
survey package in R: How to set fpc argument (finite population correction)
我使用与大小成比例的概率 (PPS) 计划从抽样框架中抽取了一些数据,这样我就根据两个变量的组合对 6
层进行了抽样:gender
和 [=14] =] 比例:
pre
gender High Low Medium
F 0.155 0.155 0.195
M 0.155 0.155 0.185
现在我想使用 R 包 "survey" 中的 svydesign
来指定我的采样数据的设计。我想知道如何定义 fpc
(有限人口校正)论点?
文档说:
For PPS sampling without replacement it is necessary to specify the probabilities for each stage of sampling using the fpc
argument, and an overall weight argument should not be given.
library(survey)
out <- read.csv('https://raw.githubusercontent.com/rnorouzian/d/master/out.csv')
dstrat <- svydesign(id=~1,strata=~gender+pre, data=out, pps = "brewer", fpc = ????)
如果我们要添加比例列,则我们按 'gender'、'pre' 分组,通过将计数除以计数的 sum
和 [=14 来创建百分比=]
out1 <- out %>%
group_by(gender, pre) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(out)
或使用 janitor
中的 adorn_percentages
library(janitor)
library(tidyr)
out1 <- out %>%
tabyl(gender, pre) %>%
adorn_percentages(denominator = "all") %>%
pivot_longer(cols = -gender, names_to = 'pre',
values_to = 'fpc') %>%
right_join(out)
如果我们需要一个函数
f1 <- function(dat, grp_cols) {
dat %>%
group_by(across(all_of(grp_cols))) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(dat)
}
f1(out, c("gender", "pre"))
#Joining, by = c("gender", "pre")
# A tibble: 200 x 11
# gender pre n fpc no. fake.name sector pretest state email phone
# <chr> <chr> <int> <dbl> <int> <chr> <chr> <int> <chr> <chr> <chr>
# 1 F High 31 0.155 1 Pont Private 1352 NY Pont@...com xxx-xx-6216
# 2 F High 31 0.155 2 Street NGO 1438 CA Street@...com xxx-xx-6405
# 3 F High 31 0.155 3 Galvan Private 1389 NY Galvan@...com xxx-xx-9195
# 4 F High 31 0.155 4 Gorman NGO 1375 CA Gorman@...com xxx-xx-1845
# 5 F High 31 0.155 5 Jacinto Private 1386 CA Jacinto@...com xxx-xx-6237
# 6 F High 31 0.155 6 Shah Public 1384 CA Shah@...com xxx-xx-5723
# 7 F High 31 0.155 7 Randon Private 1360 TX Randon@...com xxx-xx-7542
# 8 F High 31 0.155 8 Koucherik NGO 1439 NY Koucherik@...com xxx-xx-9137
# 9 F High 31 0.155 9 Waters Industry 1414 TX Waters@...com xxx-xx-7560
#10 F High 31 0.155 10 David Industry 1396 CA David@...com xxx-xx-6498
# … with 190 more rows
我使用与大小成比例的概率 (PPS) 计划从抽样框架中抽取了一些数据,这样我就根据两个变量的组合对 6
层进行了抽样:gender
和 [=14] =] 比例:
pre
gender High Low Medium
F 0.155 0.155 0.195
M 0.155 0.155 0.185
现在我想使用 R 包 "survey" 中的 svydesign
来指定我的采样数据的设计。我想知道如何定义 fpc
(有限人口校正)论点?
文档说:
For PPS sampling without replacement it is necessary to specify the probabilities for each stage of sampling using the
fpc
argument, and an overall weight argument should not be given.
library(survey)
out <- read.csv('https://raw.githubusercontent.com/rnorouzian/d/master/out.csv')
dstrat <- svydesign(id=~1,strata=~gender+pre, data=out, pps = "brewer", fpc = ????)
如果我们要添加比例列,则我们按 'gender'、'pre' 分组,通过将计数除以计数的 sum
和 [=14 来创建百分比=]
out1 <- out %>%
group_by(gender, pre) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(out)
或使用 janitor
adorn_percentages
library(janitor)
library(tidyr)
out1 <- out %>%
tabyl(gender, pre) %>%
adorn_percentages(denominator = "all") %>%
pivot_longer(cols = -gender, names_to = 'pre',
values_to = 'fpc') %>%
right_join(out)
如果我们需要一个函数
f1 <- function(dat, grp_cols) {
dat %>%
group_by(across(all_of(grp_cols))) %>%
summarise(n = n(), .groups = 'drop') %>%
mutate(fpc = n/sum(n)) %>%
right_join(dat)
}
f1(out, c("gender", "pre"))
#Joining, by = c("gender", "pre")
# A tibble: 200 x 11
# gender pre n fpc no. fake.name sector pretest state email phone
# <chr> <chr> <int> <dbl> <int> <chr> <chr> <int> <chr> <chr> <chr>
# 1 F High 31 0.155 1 Pont Private 1352 NY Pont@...com xxx-xx-6216
# 2 F High 31 0.155 2 Street NGO 1438 CA Street@...com xxx-xx-6405
# 3 F High 31 0.155 3 Galvan Private 1389 NY Galvan@...com xxx-xx-9195
# 4 F High 31 0.155 4 Gorman NGO 1375 CA Gorman@...com xxx-xx-1845
# 5 F High 31 0.155 5 Jacinto Private 1386 CA Jacinto@...com xxx-xx-6237
# 6 F High 31 0.155 6 Shah Public 1384 CA Shah@...com xxx-xx-5723
# 7 F High 31 0.155 7 Randon Private 1360 TX Randon@...com xxx-xx-7542
# 8 F High 31 0.155 8 Koucherik NGO 1439 NY Koucherik@...com xxx-xx-9137
# 9 F High 31 0.155 9 Waters Industry 1414 TX Waters@...com xxx-xx-7560
#10 F High 31 0.155 10 David Industry 1396 CA David@...com xxx-xx-6498
# … with 190 more rows