SAS:在 proc sql 中使用 group by 不会按时间顺序分离实例
SAS: using group by in proc sql doesn't separate out instances chronologically
考虑以下 SAS 代码:
data test;
format dt date9.
ctry_cd .
sn .;
input ctry_cd sn dt;
datalines;
US 1 20000
US 1 20001
US 1 20002
CA 1 20003
CA 1 20004
US 1 20005
US 1 20006
US 1 20007
ES 2 20001
ES 2 20002
;
run;
proc sql;
create table check as
select
sn,
ctry_cd,
min(dt) as begin_dt format date9.,
max(dt) as end_dt format date9.
from test
group by sn, ctry_cd;
quit;
这个returns:
1 CA 07OCT2014 08OCT2014
1 US 04OCT2014 11OCT2014
2 ES 05OCT2014 06OCT2014
我想为proc sql
区分国招;也就是说,return
1 US 04OCT2014 06OCT2014
1 CA 07OCT2014 08OCT2014
1 US 09OCT2014 11OCT2014
2 ES 05OCT2014 06OCT2014
所以它仍然按 sn 和 ctry_nm 对实例进行分组,但要注意日期,所以我有一个时间表。
然后您需要创建另一个分组变量:
data test;
set test;
prev_ctry_cd=lag(ctry_cd);
if prev_ctry_cd ^= ctry_cd then group+1;
run;
proc sql;
create table check as
select
sn,
ctry_cd,
min(dt) as begin_dt format date9.,
max(dt) as end_dt format date9.
from test
group by group, sn, ctry_cd
order by group;
quit;
如果数据按照您的示例排序,那么您可以在一个数据步骤中实现您的目标,而无需创建额外的变量。
data want;
keep sn ctry_cd begin_dt end_dt; /* keeps required variables and sets variable order */
set test;
by sn ctry_cd notsorted; /* notsorted option needed as ctry_cd is not in order */
retain begin_dt; /* retains value until needed */
if first.ctry_cd then begin_dt=dt; /* store first date for each new ctry_cd */
if last.ctry_cd then do;
end_dt=dt; /* store last date for each new ctry_cd */
output; /* output result */
end;
format begin_dt end_dt date9.;
run;
考虑以下 SAS 代码:
data test;
format dt date9.
ctry_cd .
sn .;
input ctry_cd sn dt;
datalines;
US 1 20000
US 1 20001
US 1 20002
CA 1 20003
CA 1 20004
US 1 20005
US 1 20006
US 1 20007
ES 2 20001
ES 2 20002
;
run;
proc sql;
create table check as
select
sn,
ctry_cd,
min(dt) as begin_dt format date9.,
max(dt) as end_dt format date9.
from test
group by sn, ctry_cd;
quit;
这个returns:
1 CA 07OCT2014 08OCT2014
1 US 04OCT2014 11OCT2014
2 ES 05OCT2014 06OCT2014
我想为proc sql
区分国招;也就是说,return
1 US 04OCT2014 06OCT2014
1 CA 07OCT2014 08OCT2014
1 US 09OCT2014 11OCT2014
2 ES 05OCT2014 06OCT2014
所以它仍然按 sn 和 ctry_nm 对实例进行分组,但要注意日期,所以我有一个时间表。
然后您需要创建另一个分组变量:
data test;
set test;
prev_ctry_cd=lag(ctry_cd);
if prev_ctry_cd ^= ctry_cd then group+1;
run;
proc sql;
create table check as
select
sn,
ctry_cd,
min(dt) as begin_dt format date9.,
max(dt) as end_dt format date9.
from test
group by group, sn, ctry_cd
order by group;
quit;
如果数据按照您的示例排序,那么您可以在一个数据步骤中实现您的目标,而无需创建额外的变量。
data want;
keep sn ctry_cd begin_dt end_dt; /* keeps required variables and sets variable order */
set test;
by sn ctry_cd notsorted; /* notsorted option needed as ctry_cd is not in order */
retain begin_dt; /* retains value until needed */
if first.ctry_cd then begin_dt=dt; /* store first date for each new ctry_cd */
if last.ctry_cd then do;
end_dt=dt; /* store last date for each new ctry_cd */
output; /* output result */
end;
format begin_dt end_dt date9.;
run;