重新编码分类变量(Stata)

Recoding Categorical Variable (Stata)

我正在尝试将 income_change 分类变量从 5 组更改为 3 组。

当前变量看起来:

tab income_change                  frequency
Decreased by more than 25% |        333
        Decreased by 1-25% |        331
           Stayed the same |        222
        Increased by 1-25% |         23
Increased by more than 25% |         12

变量存储为:

         storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------
income_change            int     %26.0g      Lchg

要根据以上五个类别创建三个组,我 运行 这样做,但我收到此错误消息“类型不匹配”

gen perc_change = income_change            
recode   perc_change ="Income Decreased"  if perc_change =="1"  | if perc_change =="2"
recode   perc_change ="Same Income"  if perc_change =="3"
recode   perc_change ="Income Increased"  if perc_change =="4" | if perc_change =="5"

perc_change变量存储如下:


              storage   display    value
variable name   type    format     label      
--------------------------------------------------------------------------------------------------------------------------
perc_change     float   %9.0g 

已通过以下建议的解决方案解决:

gen inc_change = income_change 
gen inc_perc_change = ""
replace inc_perc_change ="Income Decreased"  if inc_change == 1 | inc_change == 2
replace inc_perc_change ="Same Income"       if inc_change_perc == 3
replace inc_perc_change ="Income Increased"  if inc_change_perc == 4 | inc_change_perc == 5
tab inc_perc_change 

生成了我正在寻找的图表:

catplot  tn_cor22_str inc_perc_change, percent(tn_cor22_str)

似乎income_change是一个带有文本标签的数值变量。你能试试这样的东西吗:

gen perc_change = ""
replace perc_change ="Income Decreased"  if income_change == 1 | income_change == 2
replace perc_change ="Same Income"       if income_change == 3
replace perc_change ="Income Increased"  if income_change == 4 | income_change == 5
tab perc_change 

如果上面的代码不起作用,很可能是income_change 的值不是1到5,需要将1-5改为income_change 的相关值您的数据以设置正确的条件。

或者,您可以使用:

gen perc_change = ""
replace perc_change ="Income Decreased"  if inrange(perc_change, 1, 2)
replace perc_change ="Same Income"       if perc_change == 3
replace perc_change ="Income Increased"  if inrange(perc_change, 4, 5)

虽然您得到了您想要的结果,但生成的变量并不完美,因为(特别是)它甚至不会按照您想要的方式排序。另一种可能性是使用新值标签的粗化数值变量,如 say

gen change_class = 1 if inlist(perc_change, "1", "2") 
replace change_class = 2 if perc_change == "3" 
replace change_class = 3 if inlist(perc_change, "4", "5") 
label def change_class 1 Decreased 2 Same 3 Increased 
label val change_class change_class