用于权衡的变量集改变了结果估计
The set of variables used for weighing-up changes the resulting estimates
比较传递给 svyby
函数的变量集对结果估计值和标准误差的影响时,我发现权衡单个变量和两个变量会产生相同的估计值,但权衡多变量产生的估计值明显低于其他两种方法。
这是什么原因,我怎样才能避免这种情况发生?
Link 到数据集:https://drive.google.com/open?id=1xqFxUBLZifaz57yvoNFOcvhBDGuHuSMq
这是我的代码:
library(tidyverse)
library(survey)
load("des2004small.RData")
weighUp <- function(variables) {
svyby(formula = make.formula(variables), by = ~statefip,
design = des2004small,
FUN = svytotal, na.rm = TRUE)
}
# Weigh up a single variable:
dfstate2004_singleVariable = weighUp(c("race_acs"))
# Weigh up two variables:
dfstate2004_twoVariables = weighUp(c("race_acs", "cvap_acs"))
# Weigh up multiple variables:
dfstate2004_multipleVariables = weighUp(c("race_acs", "cit_acs",
"educ_acs", "unemployed_acs", "labforce_acs", "poverty_acs", "cvap_acs"))
# Compare the three diffent methods:
comparison2004 = dfstate2004_singleVariable %>%
inner_join(dfstate2004_twoVariables, by = "statefip", suffix = c(".single", ".two")) %>%
inner_join(dfstate2004_multipleVariables, by = "statefip", suffix = c("", ".multiple"))
race_acswhite2004 = comparison2004 %>%
select(statefip,
single = race_acswhite.single,
two = race_acswhite.two,
multiple = race_acswhite)
race_acswhite2004
以下是不同的估计结果:
+-------------------------------------+
| statefip single two multiple |
+-------------------------------------+
| 1 1 3084123 3084123 2128346 |
| 2 2 427008 427008 277075 |
+-------------------------------------+
'multiple' table 中的变量有缺失值,svytotal
丢弃任何 变量中的缺失值观察值它正在分析。好吧,默认情况下它会给出 NA
结果,但是如果你要求它使用 na.rm=TRUE
删除缺失值,它会删除它们和整个观察结果。
比较传递给 svyby
函数的变量集对结果估计值和标准误差的影响时,我发现权衡单个变量和两个变量会产生相同的估计值,但权衡多变量产生的估计值明显低于其他两种方法。
这是什么原因,我怎样才能避免这种情况发生?
Link 到数据集:https://drive.google.com/open?id=1xqFxUBLZifaz57yvoNFOcvhBDGuHuSMq
这是我的代码:
library(tidyverse)
library(survey)
load("des2004small.RData")
weighUp <- function(variables) {
svyby(formula = make.formula(variables), by = ~statefip,
design = des2004small,
FUN = svytotal, na.rm = TRUE)
}
# Weigh up a single variable:
dfstate2004_singleVariable = weighUp(c("race_acs"))
# Weigh up two variables:
dfstate2004_twoVariables = weighUp(c("race_acs", "cvap_acs"))
# Weigh up multiple variables:
dfstate2004_multipleVariables = weighUp(c("race_acs", "cit_acs",
"educ_acs", "unemployed_acs", "labforce_acs", "poverty_acs", "cvap_acs"))
# Compare the three diffent methods:
comparison2004 = dfstate2004_singleVariable %>%
inner_join(dfstate2004_twoVariables, by = "statefip", suffix = c(".single", ".two")) %>%
inner_join(dfstate2004_multipleVariables, by = "statefip", suffix = c("", ".multiple"))
race_acswhite2004 = comparison2004 %>%
select(statefip,
single = race_acswhite.single,
two = race_acswhite.two,
multiple = race_acswhite)
race_acswhite2004
以下是不同的估计结果:
+-------------------------------------+
| statefip single two multiple |
+-------------------------------------+
| 1 1 3084123 3084123 2128346 |
| 2 2 427008 427008 277075 |
+-------------------------------------+
'multiple' table 中的变量有缺失值,svytotal
丢弃任何 变量中的缺失值观察值它正在分析。好吧,默认情况下它会给出 NA
结果,但是如果你要求它使用 na.rm=TRUE
删除缺失值,它会删除它们和整个观察结果。