在 R 中尝试分析 survey 和 srvyr 包中的数据集时获取 NA?
Getting NAs when attempting to analyse dataset in both the survey and srvyr packages in R?
这是我的第一个 post(我是 R 的真正初学者),所以请放轻松...
我正在尝试分析 R 中的 Australian Electoral Study 数据集。这是一项在澳大利亚联邦选举后对澳大利亚具有全国代表性的选民样本进行的调查(惊喜)。
与其他同类数据集一样,它使用权重来确保充分代表全国人口。
当我使用 svryr
包或 survey
包在 R 中分析此数据时,它只输出 NA,而不是我正在寻找的统计数据。
例如,当我尝试在变量 1A 中找到受访者答案的百分比时(如果您想重现,请参阅 post 底部的代码),我得到了这个输出:
# A tibble: 5 x 5
A1 proportion proportion_se total total_se
<fct> <dbl> <dbl> <dbl> <dbl>
1 A good deal NA NaN NA NaN
2 Some NA NaN NA NaN
3 Not much NA NaN NA NaN
4 None NA NaN NA NaN
5 Item skipped NA NaN NA NaN
显然不理想。
我不太清楚我做错了什么,所以任何帮助都会很棒。提前非常感谢你......并为长代码块道歉(如果我知道我哪里出错了,我只会复制那个块,我保证!)这是我目前的代码:
## getting the gang back together
library(tidyverse)
library(dplyr)
library(ggplot2)
library(srvyr)
library(survey)
library(haven)
download.file("http://legacy.ada.edu.au/ADAData/data/aes_2016_01365.sav", "AES_2016.sav")
aes_2016 <- read_spss("AES_2016.sav")
## cleaning the data.frame such that variables are factors
aes_2016_clean <- aes_2016
for (i in seq_along(aes_2016)) {
try(aes_2016_clean[[i]] <- as_factor(aes_2016[[i]]))
}
## loading up the survey design in both srvyr and survey using the wt_enrol weights
aes_2016_srvyr <- as_survey_design(aes_2016_clean, ids = 1, weights = wt_enrol)
aes_2016_survey <- svydesign(id = ~1, weights = ~wt_enrol, data = aes_2016_clean)
## attempting to get proportion of respondents' answers to variable 1A in both srvyr and survey
aes_2016_srvyr %>%
group_by(A1) %>%
summarize(proportion = survey_mean(),
total = survey_total())
svymean(~A1, aes_2016_survey)
数据中有 NA。你必须决定如何对付他们。这可能不是你想要的:
aes_2016_srvyr %>%
group_by(A1) %>%
summarize(proportion = survey_mean(na.rm=TRUE),
total = survey_total(na.rm=TRUE))
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 A good deal 0.337 0.0110 911. 30.8
## 2 Some 0.434 0.0119 1175. 35.7
## 3 Not much 0.181 0.0101 489. 29.2
## 4 None 0.0481 0.00649 130. 18.0
## 5 Item skipped 0 0 0 0
这是我的第一个 post(我是 R 的真正初学者),所以请放轻松...
我正在尝试分析 R 中的 Australian Electoral Study 数据集。这是一项在澳大利亚联邦选举后对澳大利亚具有全国代表性的选民样本进行的调查(惊喜)。
与其他同类数据集一样,它使用权重来确保充分代表全国人口。
当我使用 svryr
包或 survey
包在 R 中分析此数据时,它只输出 NA,而不是我正在寻找的统计数据。
例如,当我尝试在变量 1A 中找到受访者答案的百分比时(如果您想重现,请参阅 post 底部的代码),我得到了这个输出:
# A tibble: 5 x 5
A1 proportion proportion_se total total_se
<fct> <dbl> <dbl> <dbl> <dbl>
1 A good deal NA NaN NA NaN
2 Some NA NaN NA NaN
3 Not much NA NaN NA NaN
4 None NA NaN NA NaN
5 Item skipped NA NaN NA NaN
显然不理想。
我不太清楚我做错了什么,所以任何帮助都会很棒。提前非常感谢你......并为长代码块道歉(如果我知道我哪里出错了,我只会复制那个块,我保证!)这是我目前的代码:
## getting the gang back together
library(tidyverse)
library(dplyr)
library(ggplot2)
library(srvyr)
library(survey)
library(haven)
download.file("http://legacy.ada.edu.au/ADAData/data/aes_2016_01365.sav", "AES_2016.sav")
aes_2016 <- read_spss("AES_2016.sav")
## cleaning the data.frame such that variables are factors
aes_2016_clean <- aes_2016
for (i in seq_along(aes_2016)) {
try(aes_2016_clean[[i]] <- as_factor(aes_2016[[i]]))
}
## loading up the survey design in both srvyr and survey using the wt_enrol weights
aes_2016_srvyr <- as_survey_design(aes_2016_clean, ids = 1, weights = wt_enrol)
aes_2016_survey <- svydesign(id = ~1, weights = ~wt_enrol, data = aes_2016_clean)
## attempting to get proportion of respondents' answers to variable 1A in both srvyr and survey
aes_2016_srvyr %>%
group_by(A1) %>%
summarize(proportion = survey_mean(),
total = survey_total())
svymean(~A1, aes_2016_survey)
数据中有 NA。你必须决定如何对付他们。这可能不是你想要的:
aes_2016_srvyr %>%
group_by(A1) %>%
summarize(proportion = survey_mean(na.rm=TRUE),
total = survey_total(na.rm=TRUE))
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 A good deal 0.337 0.0110 911. 30.8
## 2 Some 0.434 0.0119 1175. 35.7
## 3 Not much 0.181 0.0101 489. 29.2
## 4 None 0.0481 0.00649 130. 18.0
## 5 Item skipped 0 0 0 0