SAS中满足特定条件的每个观测值中变量的比例

Question

HAVE 是一个 SAS 数据集，包含 1700 个观测值和约 1,000 个变量。除了id之外还有三个"types"的变量。它们由不同的前缀表示。这是文件的一个子集：

id    a_dog b_dog c_dog a_cat b_cat c_cat a_mouse b_mouse c_mouse ...
prsn1     1    -1    -2     2     2     0       1       4       1   
prsn2    -1    -3     4     2     2    -1       0      -1      -1   
...

我需要根据变量类型（即 (a_, b_，或 c_）。解决方案应将这些新变量附加到文件中：

... prop_a_gt0 prop_a_lt0 prop_a_eq0 prop_b_gt0 prop_b_lt0 prop_b_eq0 prop_c_gt0 prop_c_lt0 prop_c_eq0
...     1.0000     0.0000     0.0000     0.6667     0.3333     0.0000     0.3333     0.3333     0.3333
...     0.3333     0.3333     0.3333     0.3333     0.6667     0.0000     0.3333     0.6667     0.0000

请注意 prop_b_gt0，例如，对于 prsn1 是 0.6667，因为三个 中的两个prsn1 行中的 b_ 变量的值大于 0。

我不确定如何系统地完成此操作。也许有一种方法可以将数组与 proc sql 步骤结合起来？欢迎任何解决方案！

Answer 1

对于数组，您将需要遍历数组并计算更大的数字（并且可能计算非缺失的数字）。

data want;
  set have ;
  array a a_: ;
  numerator=0;
  denominator=0;
  do index=1 to dim(a);
    numerator=sum(numerator,a[index]>0);
    denominator=sum(denominator,not missing(a[index]));
  end;
  prob_a_gt0=numerator/denominator;
  drop index numerator denominator;
run;

只需复制 B 和 C 变量的代码块。

Answer 2

对于三个以上数组（按变量名后缀 A、B、C 分组）的情况，宏将有助于确保在复制和粘贴（代码复制）过程中不会发生拼写错误或杂散编辑。

假设宏 compute_proportions 发出循环遍历数据步中定义的变量数组的代码。代码生成器统计循环过程中满足条件的每个条件状态，并计算循环后的比例。

* simulate data;

data have;
  array a a_1-a_300;  * for simplicity, presume 1 to 300 correspond to dog, cat, mouse, ...;
  array b b_1-b_300;
  array c c_1-c_300;

  call streaminit(123);

  do id = 1 to 10;
    do _n_ = 1 to dim(a);
      a (_n_) = ceil(rand('uniform', 9)) - 5;
      b (_n_) = ceil(rand('uniform', 9)) - 5;
      c (_n_) = ceil(rand('uniform', 9)) - 5;
    end;
    output;
  end;
run;

%macro compute_proportions(array=, prefix=);

  _lt = 0; %* <0 count;
  _eq = 0; %* =0 count;
  _gt = 0; %* >0 count;
  _n  = 0;

  do _index = 1 to dim(&array);

    _v = &array(_n_);

    if not missing(_v) then do;
      _lt + _v < 0;
      _eq + _v = 0;
      _gt + _v > 0;
      _n + 1;
    end;

  end;

  if _n > 0 then do;
    &prefix.prop_lt0 = _lt / _n;
    &prefix.prop_eq0 = _eq / _n;
    &prefix.prop_gt0 = _gt / _n;
  end;

  drop _lt _eq _gt _index _v _n;
%mend;

data want;
  set have;

  array a a_:; * all variables whose names start with a_ can be array referenced during step;
  array b b_:;
  array c c_:;

  %compute_proportions (array=a, prefix=a_)
  %compute_proportions (array=b, prefix=b_)
  %compute_proportions (array=c, prefix=c_)
run;

SAS中满足特定条件的每个观测值中变量的比例

Proportion of the variables in each observation that satisfy certain conditions in SAS

arrays

sas

proc-sql

data-cleaning