如何使用日期分割点的数据集将纵向事件数据分配给阶段?

How to assign longitudinal event data to phases, using a dataset of date cutpoints?

我有一个数据集,其中列出了某些产品的购买日期。一个单独的数据集列出了每种产品营销活动不同阶段的截止日期。我想根据购买日期发生在营销活动的哪个阶段,为每个购买日期分配一个阶段编号(1 到 n)。每个产品都有自己的活动,具有不同的截止日期。

在我的脑海里,我认为我想要的是"For each purchase event date, look up the cut-dates for that product's marketing campaign and see how many cut-dates had passed as of the purchase date, and add 1 to compute the Phase number."

所以有这样的数据:

data have;  *Purchase events;
  input product  Date mmddyy10.;
  format date mmddyy10.;
  cards;
A 1/1/2015
A 3/1/2015
A 3/1/2015
A 6/1/2015
A 9/1/2015
B 1/1/2015
B 3/1/2015
B 6/1/2015
B 9/1/2015
C 1/1/2015
;
run;

data cut; *cut dates for marketing campaign;
input product  CutDate mmddyy10. ;
format cutdate mmddyy10.;
cards;
A 2/1/2015
B 4/1/2015
B 7/1/2015
;
run;

并且想要:

product          Date    Phase
   A       01/01/2015      1
   A       03/01/2015      2
   A       03/01/2015      2
   A       06/01/2015      2
   A       09/01/2015      2
   B       01/01/2015      1
   B       03/01/2015      1
   B       06/01/2015      2
   B       09/01/2015      3
   C       01/01/2015      1

我一直在尝试一种似乎有效的相关子查询方法,但我觉得一定有更好的方法。

proc sql;
  create table want as
  select h.*
        ,coalesce
          (
           (select count(*)
            from cut c
            where h.product=c.product and c.cutdate<=h.date
            group by product
            )
           ,0
          )+1 as Phase 
    from have as h
  ;
quit;

我的真实数据有数百个产品,每个产品有 0 到 4 个截止日期,以及数百万个购买事件。

使用哈希表解决这个问题:

data want;
    if 0 then set cut;

    if _N_ = 1 then do;
        declare hash h(dataset: 'cut', multidata: 'y');
        h.defineKey('product');
        h.defineData('CutDate');
        h.defineDone();
        call missing(product, CutDate);
    end;

    set have;

    Phase = 1;
    rc = h.find();
    if rc = 0 then do;
        do while(Date > CutDate and rc = 0);
            rc = h.find_next();
            Phase = Phase + 1;
        end;
    end;
    drop rc;
run;