填写滚动平均值的空白日期 - Snowflake 中的 CTE
Fill in blank dates for rolling average - CTE in Snowflake
我有两个 tables – activity
和 purchase
Activity table:
user_id date videos_watched
1 2020-01-02 3
1 2020-01-04 5
1 2020-01-07 5
购买table:
user_id purchase_date
1 2020-01-01
2 2020-02-02
我想做的是获得自购买以来观看了多少视频的 30 天滚动平均值。
基本查询是这样的:
SELECT
DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
GROUP BY 1;
但是,Activity
table 只有每天记录视频的记录。我想填空视频未被观看的天数。
我已经开始研究使用这样的 CTE
:
WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte
cross join purchases p
left outer join activity a
on p.user id = a.user_id
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)
最终目标是拥有这样的东西:
days_since_purchase videos_watched
1 3
2 0 --CTE coalesce inserted value
3 0
4 5
过去几个小时一直在努力把它弄好,但仍然无法真正掌握它。
如果您想填补结果集中的空白,那么我认为您应该生成整数而不是日期:
WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day, 30, p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;
您可以使用递归查询生成每次购买后的 30 天,然后将 activity table:
with cte as (
select
purchase_date,
client_id,
0 days_since_purchase,
purchase_date dt
from purchases
union all
select
purchase_date,
client_id,
days_since_purchase + 1
dateadd(day, days_since_purchase + 1, purchase_date)
from cte
where days_since_purchase < 30
)
select
c.days_since_purchase,
avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
on a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase
您的问题不清楚 activity
table 中是否有一列存储每行相关的购买日期。您的查询包含 fdate
列,但没有您的示例数据。我在查询中使用了该列(如果没有该列,您最终可能会在不同的购买中计算相同的 activity)。
我有两个 tables – activity
和 purchase
Activity table:
user_id date videos_watched
1 2020-01-02 3
1 2020-01-04 5
1 2020-01-07 5
购买table:
user_id purchase_date
1 2020-01-01
2 2020-02-02
我想做的是获得自购买以来观看了多少视频的 30 天滚动平均值。
基本查询是这样的:
SELECT
DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED)
FROM PURCHASE P
LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
GROUP BY 1;
但是,Activity
table 只有每天记录视频的记录。我想填空视频未被观看的天数。
我已经开始研究使用这样的 CTE
:
WITH cte AS (
SELECT date('2020-01-01') as fdate
UNION ALL
SELECT CAST(DATEADD(day,1,fdate) as date)
FROM cte
WHERE fdate < date('2020-04-01')
) select * from cte
cross join purchases p
left outer join activity a
on p.user id = a.user_id
and a.fdate = p.purchase_date
and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)
最终目标是拥有这样的东西:
days_since_purchase videos_watched
1 3
2 0 --CTE coalesce inserted value
3 0
4 5
过去几个小时一直在努力把它弄好,但仍然无法真正掌握它。
如果您想填补结果集中的空白,那么我认为您应该生成整数而不是日期:
WITH cte AS (
SELECT 1 as day_since_purchase
UNION ALL
SELECT 1 + day_since_purchase
FROM cte
WHERE day_since_purchase < 4
)
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
(SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
FROM purchases p JOIN
activity a
ON p.user id = a.user_id AND
a.fdate = p.purchase_date AND
a.date >= p.purchase_date AND
a.date <= dateadd(day, 30, p.purchase_date)
GROUP BY 1
) pa
ON pa.day_since_purchase = cte.day_since_purchase;
您可以使用递归查询生成每次购买后的 30 天,然后将 activity table:
with cte as (
select
purchase_date,
client_id,
0 days_since_purchase,
purchase_date dt
from purchases
union all
select
purchase_date,
client_id,
days_since_purchase + 1
dateadd(day, days_since_purchase + 1, purchase_date)
from cte
where days_since_purchase < 30
)
select
c.days_since_purchase,
avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
on a.client_id = c.client_id
and a.fdate = c.purchase_date
and a.date = c.dt
group by c.days_since_purchase
您的问题不清楚 activity
table 中是否有一列存储每行相关的购买日期。您的查询包含 fdate
列,但没有您的示例数据。我在查询中使用了该列(如果没有该列,您最终可能会在不同的购买中计算相同的 activity)。