如何使用日期分割点的数据集将纵向事件数据分配给阶段?
How to assign longitudinal event data to phases, using a dataset of date cutpoints?
我有一个数据集,其中列出了某些产品的购买日期。一个单独的数据集列出了每种产品营销活动不同阶段的截止日期。我想根据购买日期发生在营销活动的哪个阶段,为每个购买日期分配一个阶段编号(1 到 n)。每个产品都有自己的活动,具有不同的截止日期。
在我的脑海里,我认为我想要的是"For each purchase event date, look up the cut-dates for that product's marketing campaign and see how many cut-dates had passed as of the purchase date, and add 1 to compute the Phase number."
所以有这样的数据:
data have; *Purchase events;
input product Date mmddyy10.;
format date mmddyy10.;
cards;
A 1/1/2015
A 3/1/2015
A 3/1/2015
A 6/1/2015
A 9/1/2015
B 1/1/2015
B 3/1/2015
B 6/1/2015
B 9/1/2015
C 1/1/2015
;
run;
data cut; *cut dates for marketing campaign;
input product CutDate mmddyy10. ;
format cutdate mmddyy10.;
cards;
A 2/1/2015
B 4/1/2015
B 7/1/2015
;
run;
并且想要:
product Date Phase
A 01/01/2015 1
A 03/01/2015 2
A 03/01/2015 2
A 06/01/2015 2
A 09/01/2015 2
B 01/01/2015 1
B 03/01/2015 1
B 06/01/2015 2
B 09/01/2015 3
C 01/01/2015 1
我一直在尝试一种似乎有效的相关子查询方法,但我觉得一定有更好的方法。
proc sql;
create table want as
select h.*
,coalesce
(
(select count(*)
from cut c
where h.product=c.product and c.cutdate<=h.date
group by product
)
,0
)+1 as Phase
from have as h
;
quit;
我的真实数据有数百个产品,每个产品有 0 到 4 个截止日期,以及数百万个购买事件。
使用哈希表解决这个问题:
data want;
if 0 then set cut;
if _N_ = 1 then do;
declare hash h(dataset: 'cut', multidata: 'y');
h.defineKey('product');
h.defineData('CutDate');
h.defineDone();
call missing(product, CutDate);
end;
set have;
Phase = 1;
rc = h.find();
if rc = 0 then do;
do while(Date > CutDate and rc = 0);
rc = h.find_next();
Phase = Phase + 1;
end;
end;
drop rc;
run;
我有一个数据集,其中列出了某些产品的购买日期。一个单独的数据集列出了每种产品营销活动不同阶段的截止日期。我想根据购买日期发生在营销活动的哪个阶段,为每个购买日期分配一个阶段编号(1 到 n)。每个产品都有自己的活动,具有不同的截止日期。
在我的脑海里,我认为我想要的是"For each purchase event date, look up the cut-dates for that product's marketing campaign and see how many cut-dates had passed as of the purchase date, and add 1 to compute the Phase number."
所以有这样的数据:
data have; *Purchase events;
input product Date mmddyy10.;
format date mmddyy10.;
cards;
A 1/1/2015
A 3/1/2015
A 3/1/2015
A 6/1/2015
A 9/1/2015
B 1/1/2015
B 3/1/2015
B 6/1/2015
B 9/1/2015
C 1/1/2015
;
run;
data cut; *cut dates for marketing campaign;
input product CutDate mmddyy10. ;
format cutdate mmddyy10.;
cards;
A 2/1/2015
B 4/1/2015
B 7/1/2015
;
run;
并且想要:
product Date Phase
A 01/01/2015 1
A 03/01/2015 2
A 03/01/2015 2
A 06/01/2015 2
A 09/01/2015 2
B 01/01/2015 1
B 03/01/2015 1
B 06/01/2015 2
B 09/01/2015 3
C 01/01/2015 1
我一直在尝试一种似乎有效的相关子查询方法,但我觉得一定有更好的方法。
proc sql;
create table want as
select h.*
,coalesce
(
(select count(*)
from cut c
where h.product=c.product and c.cutdate<=h.date
group by product
)
,0
)+1 as Phase
from have as h
;
quit;
我的真实数据有数百个产品,每个产品有 0 到 4 个截止日期,以及数百万个购买事件。
使用哈希表解决这个问题:
data want;
if 0 then set cut;
if _N_ = 1 then do;
declare hash h(dataset: 'cut', multidata: 'y');
h.defineKey('product');
h.defineData('CutDate');
h.defineDone();
call missing(product, CutDate);
end;
set have;
Phase = 1;
rc = h.find();
if rc = 0 then do;
do while(Date > CutDate and rc = 0);
rc = h.find_next();
Phase = Phase + 1;
end;
end;
drop rc;
run;