在 sas 中寻找新用户和重复用户

Finding new versus repeated users in sas

在给定的数据集下,我试图找到新用户与重复用户。

DATE        ID       Unique_Event
20200901    a12345   1
20200902    a12345   1
20200903    b12345   1
20200903    a12345   1
20200904    c12345   1

在上面的数据集中,由于a12345出现了多个日期,应该算作“重复”用户,而b12345只出现了一次,所以他是一个“新”用户。请注意,这只是示例数据,因为实际数据非常大。我尝试了下面的代码,但我没有得到正确的计数。理想情况下,tot_num_users-num_new_users 应该是重复用户,但我得到的计数不正确。我错过了什么吗?

Expected Output:
Month   new_users   repeated_users
9        2           1

代码:

data user_events;
set user_events;
new_date=input(date,yymmdd10.);
run;
proc sql;select  month(new_date) as mm,
       count(distinct vv.id) as total_num_users,
       count(distinct case when v.new_date = vv.minva then v.id end) as num_new_users,
   (count(distinct vv.id) - count(distinct case when v.new_date = vv.minva then id end)
   ) as num_repeated_users
from user_events v inner join
     (select t.id, min(new_date) as minva
      from user_events t
      group by t.id
     ) vv
     on v.id = vv.id
group by  1
order by 1;quit;

在 sub-select 中,对于每个 ID,您可以计算不同 DATE 的数量以确定 new / repeated 状态。所有 ids 聚合计算均来自 sub-select.

proc sql;
  create table freq as
  select 
    count(*) as id_count
  , sum (status='repeated') as id_repeated_count   /* sum counts a logic eval state */
  , sum (status='new')      as id_new_count
  from 
    ( select 
          id
        , case 
            when count(distinct date) > 1 then 'repeated' 
            else 'new'
          end as status
      from 
        user_events
      group by
        id
    ) as statuses
  ;

不使用 proc sql 的替代解决方案(尽管我知道您用“proc sql”标记了它)。

data final;
    set user_events;
    Month=month(new_date);
run;
proc sort data=final; by Month ID;

data final;
    set final;
    by Month ID;

    if first.Month then do;
        new_users=0;
        repeated_users=0;
    end;
    if last.ID then do;
        if first.ID then
            new_users+1;
        else
            repeated_users+1;
    end;
    if last.Month then
        output;

    keep Month new_users repeated_users;
run;

由于您使用的是 proc sql,因此这是一个 sql 问题,而不是 SAS 问题。 尝试类似的东西:

proc sql;
    select ID,count(Unique_Event)
    from <that table>
    group by ID
    order by ID
run;