重新编码分类变量(Stata)
Recoding Categorical Variable (Stata)
我正在尝试将 income_change 分类变量从 5 组更改为 3 组。
当前变量看起来:
tab income_change frequency
Decreased by more than 25% | 333
Decreased by 1-25% | 331
Stayed the same | 222
Increased by 1-25% | 23
Increased by more than 25% | 12
变量存储为:
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------
income_change int %26.0g Lchg
要根据以上五个类别创建三个组,我 运行 这样做,但我收到此错误消息“类型不匹配”
gen perc_change = income_change
recode perc_change ="Income Decreased" if perc_change =="1" | if perc_change =="2"
recode perc_change ="Same Income" if perc_change =="3"
recode perc_change ="Income Increased" if perc_change =="4" | if perc_change =="5"
perc_change变量存储如下:
storage display value
variable name type format label
--------------------------------------------------------------------------------------------------------------------------
perc_change float %9.0g
已通过以下建议的解决方案解决:
gen inc_change = income_change
gen inc_perc_change = ""
replace inc_perc_change ="Income Decreased" if inc_change == 1 | inc_change == 2
replace inc_perc_change ="Same Income" if inc_change_perc == 3
replace inc_perc_change ="Income Increased" if inc_change_perc == 4 | inc_change_perc == 5
tab inc_perc_change
生成了我正在寻找的图表:
catplot tn_cor22_str inc_perc_change, percent(tn_cor22_str)
似乎income_change
是一个带有文本标签的数值变量。你能试试这样的东西吗:
gen perc_change = ""
replace perc_change ="Income Decreased" if income_change == 1 | income_change == 2
replace perc_change ="Same Income" if income_change == 3
replace perc_change ="Income Increased" if income_change == 4 | income_change == 5
tab perc_change
如果上面的代码不起作用,很可能是income_change
的值不是1到5,需要将1-5改为income_change
的相关值您的数据以设置正确的条件。
或者,您可以使用:
gen perc_change = ""
replace perc_change ="Income Decreased" if inrange(perc_change, 1, 2)
replace perc_change ="Same Income" if perc_change == 3
replace perc_change ="Income Increased" if inrange(perc_change, 4, 5)
虽然您得到了您想要的结果,但生成的变量并不完美,因为(特别是)它甚至不会按照您想要的方式排序。另一种可能性是使用新值标签的粗化数值变量,如 say
gen change_class = 1 if inlist(perc_change, "1", "2")
replace change_class = 2 if perc_change == "3"
replace change_class = 3 if inlist(perc_change, "4", "5")
label def change_class 1 Decreased 2 Same 3 Increased
label val change_class change_class
我正在尝试将 income_change 分类变量从 5 组更改为 3 组。
当前变量看起来:
tab income_change frequency
Decreased by more than 25% | 333
Decreased by 1-25% | 331
Stayed the same | 222
Increased by 1-25% | 23
Increased by more than 25% | 12
变量存储为:
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------
income_change int %26.0g Lchg
要根据以上五个类别创建三个组,我 运行 这样做,但我收到此错误消息“类型不匹配”
gen perc_change = income_change
recode perc_change ="Income Decreased" if perc_change =="1" | if perc_change =="2"
recode perc_change ="Same Income" if perc_change =="3"
recode perc_change ="Income Increased" if perc_change =="4" | if perc_change =="5"
perc_change变量存储如下:
storage display value
variable name type format label
--------------------------------------------------------------------------------------------------------------------------
perc_change float %9.0g
已通过以下建议的解决方案解决:
gen inc_change = income_change
gen inc_perc_change = ""
replace inc_perc_change ="Income Decreased" if inc_change == 1 | inc_change == 2
replace inc_perc_change ="Same Income" if inc_change_perc == 3
replace inc_perc_change ="Income Increased" if inc_change_perc == 4 | inc_change_perc == 5
tab inc_perc_change
生成了我正在寻找的图表:
catplot tn_cor22_str inc_perc_change, percent(tn_cor22_str)
似乎income_change
是一个带有文本标签的数值变量。你能试试这样的东西吗:
gen perc_change = ""
replace perc_change ="Income Decreased" if income_change == 1 | income_change == 2
replace perc_change ="Same Income" if income_change == 3
replace perc_change ="Income Increased" if income_change == 4 | income_change == 5
tab perc_change
如果上面的代码不起作用,很可能是income_change
的值不是1到5,需要将1-5改为income_change
的相关值您的数据以设置正确的条件。
或者,您可以使用:
gen perc_change = ""
replace perc_change ="Income Decreased" if inrange(perc_change, 1, 2)
replace perc_change ="Same Income" if perc_change == 3
replace perc_change ="Income Increased" if inrange(perc_change, 4, 5)
虽然您得到了您想要的结果,但生成的变量并不完美,因为(特别是)它甚至不会按照您想要的方式排序。另一种可能性是使用新值标签的粗化数值变量,如 say
gen change_class = 1 if inlist(perc_change, "1", "2")
replace change_class = 2 if perc_change == "3"
replace change_class = 3 if inlist(perc_change, "4", "5")
label def change_class 1 Decreased 2 Same 3 Increased
label val change_class change_class