sas 条件总和到新领域
sas conditional sum into new field
我是 SAS 的新手,有一个名为 ORIG_DATA 的简单数据集,我需要从中创建一个新的数据集摘要,它显示 Salesman_ID Day_ID[= 的总数13=]
从本质上讲,SUMMARY 输出应该如下所示,其中数字是总计的总和。
Salesman_ID|Day_1|Day_2
A |30 |40
B |60 |0
C |20 |70
在SQL,我
Select salesman_id,
sum(case when day_id=1 then total else 0 end) as day_1,
sum(case when day_id=2 then total else 0 end) as day_2
from ORIG_DATA group by salesman_id
但是对于这个问题,我不允许使用 proc sql。我还能如何在 SAS 中执行此操作?目前还没有最雾。
为非表格格式道歉
ORIG_DATA如下
Day_ID|Salesman_ID|Other_field|total
1 |A |R000 |10
1 |A |R002 |20
2 |A |R000 |10
2 |A |R004 |30
1 |B |R002 |20
1 |B |R000 |40
1 |B |R004 |0
2 |C |R003 |40
2 |C |R004 |10
1 |C |R002 |20
2 |C |R002 |20
这个怎么样?我不知道每个 salesman_id
每个 day_id
是否只有两个 other_field
记录。以下将适用于 1 到 n 条记录:
输入数据:
data ORIG_DATA ;
input Day_ID Salesman_ID $ Other_field $ total ;
cards ;
1 A R000 10
1 A R002 20
2 A R000 10
2 A R004 30
1 B R002 20
1 B R000 40
1 B R004 0
2 C R003 40
2 C R004 10
1 C R002 20
2 C R002 20
;run;
转置、求和并转回:
proc sort data=ORIG_DATA ;
by salesman_id day_id ;
proc transpose data=ORIG_DATA out=D1 ;
by salesman_id day_id ;
var total ;
run ;
data D2 ;
set D1 ;
array D(*) col: ;
_name_=cats('day_',day_id) ;
by salesman_id day_id;
total=sum(of D(*)) ;
run ;
proc transpose data=D2 out=SUMMARY(drop=_name_) name=_name_;
by salesman_id ;
var total ;
run ;
*Add zeros for missing values ;
data SUMMARY ;
set SUMMARY ;
array days day_: ;
do over days ;
if missing(days) then days=0;
end ;
run ;
其他方法:
proc summary data=orig_data nway;
class day_id salesman_id;
var total;
output out=sum(drop=_:) sum=;
run;
proc sort data=sum;
by salesman_id day_id;
run;
proc transpose data=sum out=want(drop=_name_) prefix=day_;
by salesman_id;
var total;
run;
您可以通过简单的数据步骤解决问题,请参见下面的代码。
您需要先对数据进行排序,然后指示数据与您在新组开始时将 day_1 和 day_2 重置为零的组一起使用,然后输出到数据集最后的观察。
如果您有任何问题,请告诉我。
data ORIG_DATA ;
input Day_ID Salesman_ID $ Other_field $ total ;
cards ;
1 A R000 10
1 A R002 20
2 A R000 10
2 A R004 30
1 B R002 20
1 B R000 40
1 B R004 0
2 C R003 40
2 C R004 10
1 C R002 20
2 C R002 20
;run;
proc sort;
by salesman_id;
RUN;
data salesman_id (drop=Day_ID Other_field total);
set orig_data;
by salesman_id;
if first.salesman_id then do;
day_1 = 0;
day_2 = 0;
end;
if day_id=1 then day_1 + total;
if day_id=2 then day_2 + total;
if last.salesman_id then output;
RUN;
类似的:
proc sort data = orig_data(drop = Other_field);
by salesman_id day_id;
run;
data test (drop = total);
retain salesman_id day_id;
set orig_data ;
by salesman_id day_id notsorted;
if first.day_id then sum = total;
else sum + total;
if last.day_id then output;
run;
proc transpose data = test out = t(drop=_:) prefix = day_id_;
by salesman_id;
id day_id;
var sum;
run;
我是 SAS 的新手,有一个名为 ORIG_DATA 的简单数据集,我需要从中创建一个新的数据集摘要,它显示 Salesman_ID Day_ID[= 的总数13=]
从本质上讲,SUMMARY 输出应该如下所示,其中数字是总计的总和。
Salesman_ID|Day_1|Day_2
A |30 |40
B |60 |0
C |20 |70
在SQL,我
Select salesman_id,
sum(case when day_id=1 then total else 0 end) as day_1,
sum(case when day_id=2 then total else 0 end) as day_2
from ORIG_DATA group by salesman_id
但是对于这个问题,我不允许使用 proc sql。我还能如何在 SAS 中执行此操作?目前还没有最雾。 为非表格格式道歉
ORIG_DATA如下
Day_ID|Salesman_ID|Other_field|total
1 |A |R000 |10
1 |A |R002 |20
2 |A |R000 |10
2 |A |R004 |30
1 |B |R002 |20
1 |B |R000 |40
1 |B |R004 |0
2 |C |R003 |40
2 |C |R004 |10
1 |C |R002 |20
2 |C |R002 |20
这个怎么样?我不知道每个 salesman_id
每个 day_id
是否只有两个 other_field
记录。以下将适用于 1 到 n 条记录:
输入数据:
data ORIG_DATA ;
input Day_ID Salesman_ID $ Other_field $ total ;
cards ;
1 A R000 10
1 A R002 20
2 A R000 10
2 A R004 30
1 B R002 20
1 B R000 40
1 B R004 0
2 C R003 40
2 C R004 10
1 C R002 20
2 C R002 20
;run;
转置、求和并转回:
proc sort data=ORIG_DATA ;
by salesman_id day_id ;
proc transpose data=ORIG_DATA out=D1 ;
by salesman_id day_id ;
var total ;
run ;
data D2 ;
set D1 ;
array D(*) col: ;
_name_=cats('day_',day_id) ;
by salesman_id day_id;
total=sum(of D(*)) ;
run ;
proc transpose data=D2 out=SUMMARY(drop=_name_) name=_name_;
by salesman_id ;
var total ;
run ;
*Add zeros for missing values ;
data SUMMARY ;
set SUMMARY ;
array days day_: ;
do over days ;
if missing(days) then days=0;
end ;
run ;
其他方法:
proc summary data=orig_data nway;
class day_id salesman_id;
var total;
output out=sum(drop=_:) sum=;
run;
proc sort data=sum;
by salesman_id day_id;
run;
proc transpose data=sum out=want(drop=_name_) prefix=day_;
by salesman_id;
var total;
run;
您可以通过简单的数据步骤解决问题,请参见下面的代码。 您需要先对数据进行排序,然后指示数据与您在新组开始时将 day_1 和 day_2 重置为零的组一起使用,然后输出到数据集最后的观察。
如果您有任何问题,请告诉我。
data ORIG_DATA ;
input Day_ID Salesman_ID $ Other_field $ total ;
cards ;
1 A R000 10
1 A R002 20
2 A R000 10
2 A R004 30
1 B R002 20
1 B R000 40
1 B R004 0
2 C R003 40
2 C R004 10
1 C R002 20
2 C R002 20
;run;
proc sort;
by salesman_id;
RUN;
data salesman_id (drop=Day_ID Other_field total);
set orig_data;
by salesman_id;
if first.salesman_id then do;
day_1 = 0;
day_2 = 0;
end;
if day_id=1 then day_1 + total;
if day_id=2 then day_2 + total;
if last.salesman_id then output;
RUN;
类似的:
proc sort data = orig_data(drop = Other_field);
by salesman_id day_id;
run;
data test (drop = total);
retain salesman_id day_id;
set orig_data ;
by salesman_id day_id notsorted;
if first.day_id then sum = total;
else sum + total;
if last.day_id then output;
run;
proc transpose data = test out = t(drop=_:) prefix = day_id_;
by salesman_id;
id day_id;
var sum;
run;