如何将 SAS 制表输出放入 Excel 文件
How to get SAS tabulate output into Excel file
假设我有这个 MWE 数据:
data v;
input var1 $ var2 var3 $;
datalines;
cat 3 yes
sheep 2 no
sheep 3 maybe
pig 3 maybe
goat 3 maybe
cat 2 no
pig 1 no
cat 2 no
pig 1 no
goat 3 no
cat 3 no
cat 2 yes
cat 1 yes
sheep 3 no
cat 2 no
cat 1 maybe
;
run;
我使用 proc tabulate 来计算每个值的观察次数。我对每个变量都这样做:
proc tabulate data=v;
class var1;
table (var1='' all="Total"),(N pctn);
quit;
proc tabulate data=v;
class var2;
table (var2='' all="Total"),(N pctn);
quit;
proc tabulate data=v;
class var3;
table (var3='' all="Total"),(N pctn);
quit;
我得到如下所示的输出:
N PctN
cat 8 50.00
goat 2 12.50
pig 3 18.75
sheep 3 18.75
Total 16 100.00
N PctN
1 4 25.00
2 5 31.25
3 7 43.75
Total 16 100.00
N PctN
maybe 4 25.00
no 9 56.25
yes 3 18.75
Total 16 100.00
我的问题是:
如何将其导出为以下格式的 Excel?:
Name Cat 1 N1 N1% Cat 2 N2 N2% Cat 3 N3 N3% Cat 4 N4 N4% Missing % Total Total%
var1 cat 8 50 goat 2 12.5 pig 3 18.75 sheep 3 18.75 0 16 100
var2 1 4 25 2 5 31.25 3 7 43.75 0 16 100
var3 maybe 4 25 no 9 56.25 yes 3 18.75 0 16 100
换句话说,我希望每个不同的变量都有自己的行。变量的每个值都将出现在这一行中,包括观测值的数量和总观测值的百分比。最后三列是额外的,但不是必需的:缺失观察值的百分比和数量以及变量值的总数。我该怎么做?
请注意,我是 SAS 的新手。也欢迎对代码进行任何改进,例如如何循环或压缩代码以生成表格。
随着变量数量和不同值数量的增加,所需的数据形式非常混乱且难以使用。
可以执行这些处理步骤来实现输出结构:
- 转置每一行
- 获取每个变量值组合的频率计数
- 通过变量扫描计数并为下一步构建 name/value 对的数据集。根据需要为缺失的案例插入一行。
- 将 name/value 对转置为宽结构
- 根据需要导出
例子
数据的第四个变量有一些缺失值。
data have;
input var1 $ var2 var3 $ var4;
datalines;
cat 3 yes .
sheep 2 no .
sheep 3 maybe .
pig 3 maybe .
goat 3 maybe 1
cat 2 no 1
pig 1 no 1
cat 2 no 1
pig 1 no 1
goat 3 no 1
cat 3 no 1
cat 2 yes 1
cat 1 yes 1
sheep 3 no 1
cat 2 no 2
cat 1 maybe 1
;
run;
options missing = ' ';
proc transpose data=have_v out=vector1(index=(_name_));
by rowid;
var var1 var2 var3 var4;
run;
proc freq noprint data=vector1;
by _name_;
table col1 / missing out=freqs;
run;
options missing = '.';
data freqs_0;
set freqs;
by _name_;
retain nomiss;
if first._name_ then nomiss = not missing(col1);
if first._name_ then seq=1; else seq+1;
seqc = cats(seq);
if first._name_ and missing(col1) then do;
seqc = 'missing';
seq = 0;
end;
length widename ;
if seqc ne 'missing' then do;
widename = cats("cat_",seqc);
widevalue = col1;
output;
end;
widename = cats("cat_",seqc,'_COUNT');
widevalue = COUNT;
output;
widename = cats("cat_",seqc,'_PERCENT');
widevalue = PERCENT;
output;
if last._name_ and nomiss then do;
seqc = 'missing';
widename = cats("cat_",seqc,'_COUNT');
widevalue = 0;
output;
widename = cats("cat_",seqc,'_PERCENT');
widevalue = 0;
output;
end;
keep _name_ widename widevalue;
run;
proc transpose data=freqs_0 out=wide;
by _name_;
id widename;
var widevalue;
run;
假设我有这个 MWE 数据:
data v;
input var1 $ var2 var3 $;
datalines;
cat 3 yes
sheep 2 no
sheep 3 maybe
pig 3 maybe
goat 3 maybe
cat 2 no
pig 1 no
cat 2 no
pig 1 no
goat 3 no
cat 3 no
cat 2 yes
cat 1 yes
sheep 3 no
cat 2 no
cat 1 maybe
;
run;
我使用 proc tabulate 来计算每个值的观察次数。我对每个变量都这样做:
proc tabulate data=v;
class var1;
table (var1='' all="Total"),(N pctn);
quit;
proc tabulate data=v;
class var2;
table (var2='' all="Total"),(N pctn);
quit;
proc tabulate data=v;
class var3;
table (var3='' all="Total"),(N pctn);
quit;
我得到如下所示的输出:
N PctN
cat 8 50.00
goat 2 12.50
pig 3 18.75
sheep 3 18.75
Total 16 100.00
N PctN
1 4 25.00
2 5 31.25
3 7 43.75
Total 16 100.00
N PctN
maybe 4 25.00
no 9 56.25
yes 3 18.75
Total 16 100.00
我的问题是: 如何将其导出为以下格式的 Excel?:
Name Cat 1 N1 N1% Cat 2 N2 N2% Cat 3 N3 N3% Cat 4 N4 N4% Missing % Total Total%
var1 cat 8 50 goat 2 12.5 pig 3 18.75 sheep 3 18.75 0 16 100
var2 1 4 25 2 5 31.25 3 7 43.75 0 16 100
var3 maybe 4 25 no 9 56.25 yes 3 18.75 0 16 100
换句话说,我希望每个不同的变量都有自己的行。变量的每个值都将出现在这一行中,包括观测值的数量和总观测值的百分比。最后三列是额外的,但不是必需的:缺失观察值的百分比和数量以及变量值的总数。我该怎么做?
请注意,我是 SAS 的新手。也欢迎对代码进行任何改进,例如如何循环或压缩代码以生成表格。
随着变量数量和不同值数量的增加,所需的数据形式非常混乱且难以使用。
可以执行这些处理步骤来实现输出结构:
- 转置每一行
- 获取每个变量值组合的频率计数
- 通过变量扫描计数并为下一步构建 name/value 对的数据集。根据需要为缺失的案例插入一行。
- 将 name/value 对转置为宽结构
- 根据需要导出
例子
数据的第四个变量有一些缺失值。
data have;
input var1 $ var2 var3 $ var4;
datalines;
cat 3 yes .
sheep 2 no .
sheep 3 maybe .
pig 3 maybe .
goat 3 maybe 1
cat 2 no 1
pig 1 no 1
cat 2 no 1
pig 1 no 1
goat 3 no 1
cat 3 no 1
cat 2 yes 1
cat 1 yes 1
sheep 3 no 1
cat 2 no 2
cat 1 maybe 1
;
run;
options missing = ' ';
proc transpose data=have_v out=vector1(index=(_name_));
by rowid;
var var1 var2 var3 var4;
run;
proc freq noprint data=vector1;
by _name_;
table col1 / missing out=freqs;
run;
options missing = '.';
data freqs_0;
set freqs;
by _name_;
retain nomiss;
if first._name_ then nomiss = not missing(col1);
if first._name_ then seq=1; else seq+1;
seqc = cats(seq);
if first._name_ and missing(col1) then do;
seqc = 'missing';
seq = 0;
end;
length widename ;
if seqc ne 'missing' then do;
widename = cats("cat_",seqc);
widevalue = col1;
output;
end;
widename = cats("cat_",seqc,'_COUNT');
widevalue = COUNT;
output;
widename = cats("cat_",seqc,'_PERCENT');
widevalue = PERCENT;
output;
if last._name_ and nomiss then do;
seqc = 'missing';
widename = cats("cat_",seqc,'_COUNT');
widevalue = 0;
output;
widename = cats("cat_",seqc,'_PERCENT');
widevalue = 0;
output;
end;
keep _name_ widename widevalue;
run;
proc transpose data=freqs_0 out=wide;
by _name_;
id widename;
var widevalue;
run;