计算调查对象中某个值的 percentage/frequency
Calculate percentage/frequency of a value in a survey object
我有一个由很多变量组成的全国调查,比如这个(为了简洁我省略了一些变量):
year id y.b sex income married pens weight
2002 1 1950 F 100000 1 0 1.12
2002 2 1943 M 55000 1 1 0.55
2004 1 1950 F 88000 1 1 1.1
2004 2 1943 M 66000 1 1 0.6
2006 3 1966 M 12000 0 1 0.23
2008 3 1966 M 24000 0 1 0.23
2008 4 1972 F 33000 1 0 0.66
2010 4 1972 F 35000 1 0 0.67
其中 id 是受访者,y.b 是出生年份,已婚是虚拟变量(1 已婚,0 单身),pens 是虚拟变量,如果此人投资补充养老金,则取值 1形式;权重为调查权重。
考虑到原始调查由 2002 年到 2014 年的 40k 个观察结果组成(我对其进行了过滤,以便只有出现不止一次的个人)。我使用此命令创建一个调查对象:
d.s <- svydesign(ids=~1, data=df, weights=~weight)
既然 df 是加权的,我想找到例如投资于补充养老金的女性百分比或已婚人士的百分比;我在 R 帮助和网络上阅读了获取百分比的命令,但没有找到正确的命令。
我不完全知道你想用 weight
做什么,但这里有一个 非常 的简单解决方案,可以解决[中领取养老金的女性比例=13=]:
df <- data.frame(sex = c('F', 'M', 'F', 'M', 'M', 'M', 'F', 'F'),
married = c(1,1,1,1,0,0,1,1),
pens = c(0, 1, 1, 1, 1, 1, 0, 0),
weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))
d.s <- svydesign(ids=~1, data=df, weights=~weight)
# data frame of women with a pension
women_with_pension <- d.s$variables %>%
filter(sex == 'F' & pens == 1)
# number of rows (i.e. number of women with a pension) in that df
n_women_with_pension <- nrow(women_with_pension)
# data frame of all women
all_women <- d.s$variables %>%
filter(sex == 'F')
# number of rows (i.e. number of women) in that df
n_women <- nrow(all_women)
# divide the number of women with a pension by the total number of women
proportion_women_with_pension <- n_women_with_pension/n_women
这会给你一个基本比例的女性领取养老金。应用相同的逻辑来获得拥有养老金的已婚人士的百分比。
就 weight
变量而言,您是否尝试进行某种加权比例?在这种情况下,您可以将每个 class 中女性的 weight
值相加(包括养老金和所有女性),如下所示:
# data frame of women with a pension
women_with_pension <- d.s$variables %>%
filter(sex == 'F' & pens == 1) %>%
summarise(total_weight = sum(weight))
# number of rows (i.e. number of women with a pension) in that df
women_with_pension_weight = women_with_pension[[1]]
# data frame of all women
all_women <- d.s$variables %>%
filter(sex == 'F') %>%
summarise(total_weight = sum(weight))
# number of rows (i.e. number of women) in that df
all_women_weight <- all_women[[1]]
# divide the number of women with a pension by the total number of women
# 0.3098592 for this sample data
prop_weight_women_with_pension <- women_with_pension_weight/all_women_weight
# same setup
library(survey)
df <- data.frame(sex = c('F', 'M', 'F', 'M', 'M', 'M', 'F', 'F'),
married = c(1,1,1,1,0,0,1,1),
pens = c(0, 1, 1, 1, 1, 1, 0, 0),
weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))
d.s <- svydesign(ids=~1, data=df, weights=~weight)
# subset to women only then calculate the share with a pension
svymean( ~ pens , subset( d.s , sex == 'F' ) )
我有一个由很多变量组成的全国调查,比如这个(为了简洁我省略了一些变量):
year id y.b sex income married pens weight
2002 1 1950 F 100000 1 0 1.12
2002 2 1943 M 55000 1 1 0.55
2004 1 1950 F 88000 1 1 1.1
2004 2 1943 M 66000 1 1 0.6
2006 3 1966 M 12000 0 1 0.23
2008 3 1966 M 24000 0 1 0.23
2008 4 1972 F 33000 1 0 0.66
2010 4 1972 F 35000 1 0 0.67
其中 id 是受访者,y.b 是出生年份,已婚是虚拟变量(1 已婚,0 单身),pens 是虚拟变量,如果此人投资补充养老金,则取值 1形式;权重为调查权重。
考虑到原始调查由 2002 年到 2014 年的 40k 个观察结果组成(我对其进行了过滤,以便只有出现不止一次的个人)。我使用此命令创建一个调查对象:
d.s <- svydesign(ids=~1, data=df, weights=~weight)
既然 df 是加权的,我想找到例如投资于补充养老金的女性百分比或已婚人士的百分比;我在 R 帮助和网络上阅读了获取百分比的命令,但没有找到正确的命令。
我不完全知道你想用 weight
做什么,但这里有一个 非常 的简单解决方案,可以解决[中领取养老金的女性比例=13=]:
df <- data.frame(sex = c('F', 'M', 'F', 'M', 'M', 'M', 'F', 'F'),
married = c(1,1,1,1,0,0,1,1),
pens = c(0, 1, 1, 1, 1, 1, 0, 0),
weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))
d.s <- svydesign(ids=~1, data=df, weights=~weight)
# data frame of women with a pension
women_with_pension <- d.s$variables %>%
filter(sex == 'F' & pens == 1)
# number of rows (i.e. number of women with a pension) in that df
n_women_with_pension <- nrow(women_with_pension)
# data frame of all women
all_women <- d.s$variables %>%
filter(sex == 'F')
# number of rows (i.e. number of women) in that df
n_women <- nrow(all_women)
# divide the number of women with a pension by the total number of women
proportion_women_with_pension <- n_women_with_pension/n_women
这会给你一个基本比例的女性领取养老金。应用相同的逻辑来获得拥有养老金的已婚人士的百分比。
就 weight
变量而言,您是否尝试进行某种加权比例?在这种情况下,您可以将每个 class 中女性的 weight
值相加(包括养老金和所有女性),如下所示:
# data frame of women with a pension
women_with_pension <- d.s$variables %>%
filter(sex == 'F' & pens == 1) %>%
summarise(total_weight = sum(weight))
# number of rows (i.e. number of women with a pension) in that df
women_with_pension_weight = women_with_pension[[1]]
# data frame of all women
all_women <- d.s$variables %>%
filter(sex == 'F') %>%
summarise(total_weight = sum(weight))
# number of rows (i.e. number of women) in that df
all_women_weight <- all_women[[1]]
# divide the number of women with a pension by the total number of women
# 0.3098592 for this sample data
prop_weight_women_with_pension <- women_with_pension_weight/all_women_weight
# same setup
library(survey)
df <- data.frame(sex = c('F', 'M', 'F', 'M', 'M', 'M', 'F', 'F'),
married = c(1,1,1,1,0,0,1,1),
pens = c(0, 1, 1, 1, 1, 1, 0, 0),
weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))
d.s <- svydesign(ids=~1, data=df, weights=~weight)
# subset to women only then calculate the share with a pension
svymean( ~ pens , subset( d.s , sex == 'F' ) )