填写滚动平均值的空白日期 - Snowflake 中的 CTE

Fill in blank dates for rolling average - CTE in Snowflake

我有两个 tables – activitypurchase

Activity table:

user_id     date      videos_watched
   1     2020-01-02        3
   1     2020-01-04        5
   1     2020-01-07        5

购买table:

user_id  purchase_date 
   1       2020-01-01 
   2       2020-02-02

我想做的是获得自购买以来观看了多少视频的 30 天滚动平均值。

基本查询是这样的:

    SELECT
    DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
    AVG(A.VIDEOS_VIEWED)
    FROM PURCHASE P
    LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
        A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
    GROUP BY 1;

但是,Activity table 只有每天记录视频的记录。我想填空视频未被观看的天数。

我已经开始研究使用这样的 CTE

    WITH cte AS (
        SELECT date('2020-01-01') as fdate
        UNION ALL
        SELECT CAST(DATEADD(day,1,fdate) as date)
    FROM cte
    WHERE fdate < date('2020-04-01')
    ) select * from cte 
      cross join purchases p
      left outer join activity a 
      on p.user id = a.user_id 
      and a.fdate = p.purchase_date
      and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)

最终目标是拥有这样的东西:

days_since_purchase    videos_watched
        1                   3
        2                   0 --CTE coalesce inserted value
        3                   0
        4                   5

过去几个小时一直在努力把它弄好,但仍然无法真正掌握它。

如果您想填补结果集中的空白,那么我认为您应该生成整数而不是日期:

WITH cte AS (
      SELECT 1 as day_since_purchase
      UNION ALL
      SELECT 1 + day_since_purchase
      FROM cte
      WHERE day_since_purchase < 4
     )
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
     (SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
             AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
      FROM purchases p JOIN
           activity a 
           ON p.user id = a.user_id AND
              a.fdate = p.purchase_date AND
              a.date >= p.purchase_date AND
              a.date <= dateadd(day, 30, p.purchase_date)
      GROUP BY 1
     ) pa
     ON pa.day_since_purchase = cte.day_since_purchase;

您可以使用递归查询生成每次购买后的 30 天,然后将 activity table:

with cte as (
    select 
        purchase_date,
        client_id,
        0 days_since_purchase,
        purchase_date dt
    from purchases 
    union all
    select 
        purchase_date,
        client_id,
        days_since_purchase + 1
        dateadd(day, days_since_purchase + 1, purchase_date)
    from cte
    where days_since_purchase < 30

)
select 
    c.days_since_purchase,
    avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
    on  a.client_id = c.client_id
    and a.fdate = c.purchase_date
    and a.date = c.dt
group by c.days_since_purchase

您的问题不清楚 activity table 中是否有一列存储每行相关的购买日期。您的查询包含 fdate 列,但没有您的示例数据。我在查询中使用了该列(如果没有该列,您最终可能会在不同的购买中计算相同的 activity)。