在 sas 中寻找新用户和重复用户
Finding new versus repeated users in sas
在给定的数据集下,我试图找到新用户与重复用户。
DATE ID Unique_Event
20200901 a12345 1
20200902 a12345 1
20200903 b12345 1
20200903 a12345 1
20200904 c12345 1
在上面的数据集中,由于a12345出现了多个日期,应该算作“重复”用户,而b12345只出现了一次,所以他是一个“新”用户。请注意,这只是示例数据,因为实际数据非常大。我尝试了下面的代码,但我没有得到正确的计数。理想情况下,tot_num_users-num_new_users 应该是重复用户,但我得到的计数不正确。我错过了什么吗?
Expected Output:
Month new_users repeated_users
9 2 1
代码:
data user_events;
set user_events;
new_date=input(date,yymmdd10.);
run;
proc sql;select month(new_date) as mm,
count(distinct vv.id) as total_num_users,
count(distinct case when v.new_date = vv.minva then v.id end) as num_new_users,
(count(distinct vv.id) - count(distinct case when v.new_date = vv.minva then id end)
) as num_repeated_users
from user_events v inner join
(select t.id, min(new_date) as minva
from user_events t
group by t.id
) vv
on v.id = vv.id
group by 1
order by 1;quit;
在 sub-select 中,对于每个 ID
,您可以计算不同 DATE
的数量以确定 new
/ repeated
状态。所有 ids 聚合计算均来自 sub-select.
proc sql;
create table freq as
select
count(*) as id_count
, sum (status='repeated') as id_repeated_count /* sum counts a logic eval state */
, sum (status='new') as id_new_count
from
( select
id
, case
when count(distinct date) > 1 then 'repeated'
else 'new'
end as status
from
user_events
group by
id
) as statuses
;
不使用 proc sql 的替代解决方案(尽管我知道您用“proc sql”标记了它)。
data final;
set user_events;
Month=month(new_date);
run;
proc sort data=final; by Month ID;
data final;
set final;
by Month ID;
if first.Month then do;
new_users=0;
repeated_users=0;
end;
if last.ID then do;
if first.ID then
new_users+1;
else
repeated_users+1;
end;
if last.Month then
output;
keep Month new_users repeated_users;
run;
由于您使用的是 proc sql,因此这是一个 sql 问题,而不是 SAS 问题。
尝试类似的东西:
proc sql;
select ID,count(Unique_Event)
from <that table>
group by ID
order by ID
run;
在给定的数据集下,我试图找到新用户与重复用户。
DATE ID Unique_Event
20200901 a12345 1
20200902 a12345 1
20200903 b12345 1
20200903 a12345 1
20200904 c12345 1
在上面的数据集中,由于a12345出现了多个日期,应该算作“重复”用户,而b12345只出现了一次,所以他是一个“新”用户。请注意,这只是示例数据,因为实际数据非常大。我尝试了下面的代码,但我没有得到正确的计数。理想情况下,tot_num_users-num_new_users 应该是重复用户,但我得到的计数不正确。我错过了什么吗?
Expected Output:
Month new_users repeated_users
9 2 1
代码:
data user_events;
set user_events;
new_date=input(date,yymmdd10.);
run;
proc sql;select month(new_date) as mm,
count(distinct vv.id) as total_num_users,
count(distinct case when v.new_date = vv.minva then v.id end) as num_new_users,
(count(distinct vv.id) - count(distinct case when v.new_date = vv.minva then id end)
) as num_repeated_users
from user_events v inner join
(select t.id, min(new_date) as minva
from user_events t
group by t.id
) vv
on v.id = vv.id
group by 1
order by 1;quit;
在 sub-select 中,对于每个 ID
,您可以计算不同 DATE
的数量以确定 new
/ repeated
状态。所有 ids 聚合计算均来自 sub-select.
proc sql;
create table freq as
select
count(*) as id_count
, sum (status='repeated') as id_repeated_count /* sum counts a logic eval state */
, sum (status='new') as id_new_count
from
( select
id
, case
when count(distinct date) > 1 then 'repeated'
else 'new'
end as status
from
user_events
group by
id
) as statuses
;
不使用 proc sql 的替代解决方案(尽管我知道您用“proc sql”标记了它)。
data final;
set user_events;
Month=month(new_date);
run;
proc sort data=final; by Month ID;
data final;
set final;
by Month ID;
if first.Month then do;
new_users=0;
repeated_users=0;
end;
if last.ID then do;
if first.ID then
new_users+1;
else
repeated_users+1;
end;
if last.Month then
output;
keep Month new_users repeated_users;
run;
由于您使用的是 proc sql,因此这是一个 sql 问题,而不是 SAS 问题。 尝试类似的东西:
proc sql;
select ID,count(Unique_Event)
from <that table>
group by ID
order by ID
run;