如何显示登录从一周到另一周的变化,按同类群组细分

How to display changes in logins from one week to another segmented by cohorts

Objective: 我必须按订阅月份对电子邮件进行细分,这将决定群组。换句话说,2018 年 1 月订阅的每个人都在一个队列中,2018 年 2 月在另一个队列中。然后我需要从一周到另一周查看他们的登录 activity。如果 2018 年 1 月队列中的 100 名订阅者在 2019 年 ISO_WEEK 2 登录,其中 70 名订阅者在 ISO_WEEK 3 登录,则留存率为 70%。

问题:我不确定如何编写我的查询以将同类群组(例如 Jan2018、Feb2018、Mar2018)作为我的第一列,以下列作为计数从 2019 年开始,每个 ISO_WEEK 的不同电子邮件登录 activity。

示例数据:

CREATE TABLE member
    ([email] varchar(50), [creation_date] Datetime)
INSERT INTO member
VALUES
    ('player123@google.com', '2018-01-01 05:00:00'),
    ('player999@google.com', '2018-01-30 12:00:00'),
    ('player555@google.com', '2018-05-14 20:15:00')
CREATE TABLE login
    ([email] varchar(100), [login_date] Datetime)
INSERT INTO login
VALUES
    ('player123@google.com', '2019-01-07 05:30:00'),
    ('player123@google.com', '2019-01-07 09:30:00'),
    ('player123@google.com', '2019-01-08 08:30:00'),
    ('player123@google.com', '2019-01-15 06:30:00'),
    ('player999@google.com', '2019-01-08 11:30:00'),
    ('player999@google.com', '2019-01-10 07:30:00'),
    ('player555@google.com', '2019-01-08 04:30:00')

我试过的:

;with
cte1 AS (
    SELECT CAST(Creation_Date AS Date) AS Creation_Date
        ,CONCAT(DATEPART(MONTH,Creation_Date),'-',DATEPART(YEAR,Creation_Date)) AS Cohort
        ,email AS Emails
    FROM member
        ),
cte2 AS (
    SELECT Logins
        ,yy
        ,login_ISOWeeks
        ,Emails
    FROM (
        SELECT CAST(login_date as Date) AS Logins
            ,DATEPART(YEAR, login_date) AS yy
            ,DATEPART(ISO_WEEK,login_date) AS login_ISOWeeks
            ,email AS Emails
            ,ROW_NUMBER()
                OVER(PARTITION BY DATEPART(YEAR, login_date), DATEPART(ISO_WEEK,login_date), email ORDER BY login_date ASC) AS week_count
        FROM login) as f_log
    WHERE f_log.week_count = 1
        )

SELECT cte1.Creation_Date
    ,cte1.Cohort
    ,cte2.yy
    ,cte2.login_ISOWeeks
    ,cte1.Emails
FROM cte1
INNER JOIN cte2 ON cte1.Emails=cte2.Emails

期望输出:

Cohort   2019_2  2019_3
jan 2018    2      1
may 2018    1      0    

你的数据有很多奇怪之处。为什么 join 键是电子邮件地址而不是会员 ID?为什么电子邮件成员 "created" 多次?

为了防止联接失控,我在执行联接之前汇总了每个表。这会产生您想要的结果:

select datename(year, m.creation_date) + '-' + datename(month, m.creation_date) as yyyymm,
       count(distinct m.email) as num_members,
       sum(case when l.yyyy = 2019 and l.isoweek = 2 then 1 else 0 end) as cnt_201902,
       sum(case when l.yyyy = 2019 and l.isoweek = 3 then 1 else 0 end) as cnt_201903
from (select m.email, min(creation_date) as creation_date
      from member m
      group by m.email
     ) m left join
     (select distinct l.email, year(l.login_date) as yyyy, datepart(iso_week, l.login_date) as isoweek
      from login l
     ) l
     on m.email = l.email
group by datename(year, m.creation_date) + '-' + datename(month, m.creation_date) 
order by yyyymm;

Here 是一个 db<>fiddle.