如果缺失值少于 80%，则 SAS 输入缺失问卷数据的平均值

Question

我有一份编码为 1-5 的调查问卷，然后标记为 (.) 表示缺少变量。如何对数据进行编码以反映以下内容：

如果患者有 =>80% 的值没有缺失，那么缺失值将被编码为所回答问题的平均值。如果患者缺失超过 80% 的值，而不是将测量摘要设置为患者缺失，则删除记录。

condomuse;
set int108;
run;

proc means data=condomuse n nmiss missing;
var cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;
by Intround sid;
run;

Answer 1

使用以下假设：

每个line/record都是独一无二的人
所有变量都是数字

NMISS()、N()、CMISS() 和 DIM() 是可以处理数组的函数。

这将识别所有缺失 80% 或更多的记录。

data temp; *temp is output data set name;
    set have; *have is input data set name;

    *create an array to avoid listing all variables later;
    array vars_check(*) cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;

    *calculate percent missing;
    Percent_Missing = NMISS(of vars_check(*)) / Dim(vars_check);

    if percent_missing >= 0.8 then exclude = 'Y';
    else exclude = 'N';

 run;

要用均值或其他方法替换，PROC STDIZE 可以做到。

*temp is input data set name from previous step;
proc stdize data=temp out=temp_mean reponly method=mean;
*keep only records with more than 80%;
where exclude = 'N';

*list of vars to fill with mean;
VAR cusesability CUSESPurchase CUSESCarry CUSESDiscuss CUSESSuggest CUSESUse CUSESMaintain CUSESEmbarrass CUSESReject CUSESUnsure CUSESConfident CUSESComfort CUSESPersuade CUSESGrace CUSESSucceed;

run;

不同的标准化方法是 here，但这些是标准化方法而不是插补方法。

如果缺失值少于 80%，则 SAS 输入缺失问卷数据的平均值

SAS Inputting Mean for Missing Questionnaire Data if Missing Less Than 80% of Values

survey

sas

mean

missing-data