在 sas 中寻找新用户和重复用户

Question

在给定的数据集下，我试图找到新用户与重复用户。

DATE        ID       Unique_Event
20200901    a12345   1
20200902    a12345   1
20200903    b12345   1
20200903    a12345   1
20200904    c12345   1

在上面的数据集中，由于a12345出现了多个日期，应该算作“重复”用户，而b12345只出现了一次，所以他是一个“新”用户。请注意，这只是示例数据，因为实际数据非常大。我尝试了下面的代码，但我没有得到正确的计数。理想情况下，tot_num_users-num_new_users 应该是重复用户，但我得到的计数不正确。我错过了什么吗？

Expected Output:
Month   new_users   repeated_users
9        2           1

代码：

data user_events;
set user_events;
new_date=input(date,yymmdd10.);
run;
proc sql;select  month(new_date) as mm,
       count(distinct vv.id) as total_num_users,
       count(distinct case when v.new_date = vv.minva then v.id end) as num_new_users,
   (count(distinct vv.id) - count(distinct case when v.new_date = vv.minva then id end)
   ) as num_repeated_users
from user_events v inner join
     (select t.id, min(new_date) as minva
      from user_events t
      group by t.id
     ) vv
     on v.id = vv.id
group by  1
order by 1;quit;

Answer 1

在 sub-select 中，对于每个 ID，您可以计算不同 DATE 的数量以确定 new / repeated 状态。所有 ids 聚合计算均来自 sub-select.

proc sql;
  create table freq as
  select 
    count(*) as id_count
  , sum (status='repeated') as id_repeated_count   /* sum counts a logic eval state */
  , sum (status='new')      as id_new_count
  from 
    ( select 
          id
        , case 
            when count(distinct date) > 1 then 'repeated' 
            else 'new'
          end as status
      from 
        user_events
      group by
        id
    ) as statuses
  ;

Answer 2

不使用 proc sql 的替代解决方案（尽管我知道您用“proc sql”标记了它）。

data final;
    set user_events;
    Month=month(new_date);
run;
proc sort data=final; by Month ID;

data final;
    set final;
    by Month ID;

    if first.Month then do;
        new_users=0;
        repeated_users=0;
    end;
    if last.ID then do;
        if first.ID then
            new_users+1;
        else
            repeated_users+1;
    end;
    if last.Month then
        output;

    keep Month new_users repeated_users;
run;

Answer 3

由于您使用的是 proc sql，因此这是一个 sql 问题，而不是 SAS 问题。尝试类似的东西：

proc sql;
    select ID,count(Unique_Event)
    from <that table>
    group by ID
    order by ID
run;

在 sas 中寻找新用户和重复用户

Finding new versus repeated users in sas

sas

proc-sql