SAS PROC GENMOD - 为什么一致的语法会为两个不同的二进制变量产生不同的引用类别？

Question

我是运行 PROC GENMOD 中的一系列双变量对数二项式回归，每个模型使用相同的结果和一个二元 (1/0) 预测变量。我使用完全相同的语法，仅换出预测变量，在其中一个模型中，回归是针对预测变量类别 1 与预测变量类别 0，而在另一个模型中，它做相反的事情。可能发生了什么？

我的预测变量是：

Housing_Insecure_Dich_BL: 0 = 否， 1 = 是

PrEP_Effic_Risk_Red_binary_BL: 0 = 低于 90%， 1 = 90%+

模型 1：

proc genmod data=full3 descending;
class Housing_Insecure_Dich_BL (ref=first);
model Almost_Always_Take_3m = Housing_Insecure_Dich_BL / dist=bin link=log waldci ;
estimate 'Housing_Insecure_Dich_BL' Housing_Insecure_Dich_BL 1 -1/exp;
run;

结果： Class 级别信息 table 将值列为“是否”- 表示它正在比较是与否，即 1 与 0。考虑到原始百分比，患病率是有意义的。

模型 2：

proc genmod data=full3 descending;
class PrEP_Effic_Risk_Red_binary_BL (ref=first);
model Almost_Always_Take_3m = PrEP_Effic_Risk_Red_binary_BL / dist=bin link=log waldci ;
estimate 'PrEP_Effic_Risk_Red_binary_BL' PrEP_Effic_Risk_Red_binary_BL 1 -1/exp;
run;

结果： Class 级别信息 table 将值列为“低于 90% 90%+” - 这意味着它正在将零与一进行比较 - 当我指定 ref=first 时，为什么要这样做，并且具有不同 1-0 编码变量的完全相同的语法会产生预期的参考类别编码？流行率符合零对一的预期，但这不是我想要的。

我可以将模型 2 的语法更改为 ref=last 或 ref="Below 90%"，但我更愿意了解正在发生的事情并能够使用统一的语法，因为我的所有预测变量都是编码相同。

有人能帮忙吗？

Answer 1

这是您可能正在做的一个例子。

proc format;
  value smokef
  0 = 'Nonsmoker'
  1 = 'Smoker'
  ;
  value bpf
  0 = 'Normal BP'
  1 = 'Higher BP'
  ;
  value statusf
  0 = 'Dead'
  1 = 'Alive'
  ;
quit;

data heart;
  set sashelp.heart;
  smokeflag = (smoking ne 0);
  bpflag    = (bp_status ne 'Normal');
  statusflag= (status = 'Alive');
  format 
    smokeflag  smokef.
    bpflag     bpf.
    statusflag statusf.
  ;
run;

proc genmod data=heart;
class smokeflag;
model statusflag = smokeflag;
estimate 'Smokeflag' smokeflag 1 -1/exp;
run;


proc genmod data=heart;
class bpflag;
model statusflag = bpflag;
estimate 'Blood Pressure flag' bpflag 1 -1/exp;
run;

注意相同的问题 - 它比较 'Nonsmoker Smoker' (0 1) 但 'Higher BP Normal BP' (1 0)。那是因为GENMOD默认的order是order=formatted。 N 在 S 之前，但是 H 在 N...

之前

可以通过更改格式以包含数字（因此 1 Smoker 0 Nonsmoker 等）或使用 order=internal 选项来获得所需的结果：

proc genmod data=heart;
class smokeflag (ref=first order=internal);
model statusflag = smokeflag;
estimate 'Smokeflag' smokeflag 1 -1/exp;
run;


proc genmod data=heart;
class bpflag (ref=first order=internal);
model statusflag = bpflag;
estimate 'Blood Pressure flag' bpflag 1 -1/exp;
run;

order=internal 告诉 SAS 使用未格式化的顺序。

有些程序还支持使用 notsorted 保存的格式，但在我的测试中，GLM 不支持这种格式（通常在 preloadfmt 可用时可用）。

SAS PROC GENMOD - 为什么一致的语法会为两个不同的二进制变量产生不同的引用类别？

SAS PROC GENMOD - Why does consistent syntax produce different reference categories for two different binary variables?

syntax

regression

sas