使用 postgresql 在数周内应用窗口计数
Applying a windowed count over weeks with postgresql
我有 table 条评论看起来像这样:
我正在尝试生成 returns 4 列的结果:
- 周,从星期一开始 (yyyy-mm-dd)
- 该周评论的电影数量
- 过去 30 天内评论的不同电影数量
- 在过去 30 天内有 3 条或更多评论的不同电影的计数
我已经完成了 1、2 和 3(我认为),但我不知道如何 return 4.
这个查询给出了 1、2 和 3:
WITH week_dates AS (
SELECT
date(date_trunc('week', f.updated_at::date)) AS week_date,
count(*) Movie_Reviews
FROM Reviews f
WHERE submitted = TRUE AND about_type = 'Movie'
GROUP BY week_date
)
SELECT
wd.*,
(SELECT
count(DISTINCT ff.about_id) Fresh_Reviews
FROM Reviews ff
WHERE ff.submitted = TRUE
AND ff.about_type = 'Movie' -- reviewed within last 30 days
AND ff.updated_at <= wd.week_date
AND ff.updated_at > wd.week_date - INTERVAL '30 days'
) Freshly_Reviewed
FROM week_dates wd
ORDER BY wd.week_date ASC
对return计算在过去 30 天内有 3 条或更多条评论的不同电影有帮助吗?
WITH
-- the reviews CTE is here just in place of your reviews table
reviews AS (
SELECT
*
FROM
(
VALUES
('2014-08-02', 'Movie', True, 'Modern Times'),
('2016-10-21', 'Movie', True, 'Enter the Matrix'),
('2016-10-22', 'Movie', True, 'Enter the Matrix'),
('2016-10-23', 'Movie', True, 'Enter the Matrix'),
('2016-11-01', 'Movie', True, 'Citizen Kane'),
('2016-11-02', 'Movie', True, 'Citizen Kane'),
('2016-11-02', 'Movie', True, 'Citizen Kane'),
('2016-11-10', 'Movie', True, 'Blade Runner'),
('2016-11-17', 'Album', False, 'The Chronic'),
('2018-01-02', 'Movie', True, 'Citizen Kane'),
('2018-02-01', 'Movie', True, 'Conquest of Paradise'),
('2018-02-15', 'Movie', True, 'Modern Times'),
('2018-02-27', 'Movie', True, 'Modern Times'),
('2018-03-01', 'Movie', True, 'Citizen Kane'),
('2018-03-01', 'Movie', True, 'Modern Times'),
('2018-03-02', 'Movie', True, 'Wolf from Wall Street'),
('2018-03-02', 'Album', False, 'The Chronic'),
('2018-03-03', 'Movie', True, 'Wolf from Wall Street'),
('2018-03-12', 'Movie', True, 'Into the Wild')
) AS t(updated_at, about_type, submitted, about_id)
WHERE
submitted = TRUE
AND about_type = 'Movie'
),
-- prepare weeks and their movie counts (1)
weeks AS (
SELECT
date_trunc('week', updated_at::DATE) AS week_date,
count(*) AS count_this_week
FROM reviews
GROUP BY week_date
)
SELECT
week_date,
count_this_week,
counts.*,
count_prev_30_distinct_at_least_3.*
FROM
weeks AS w
-- lateral join allows us to use the current row of the weeks table
-- basically as a nested subquery, but more efficiently
-- similar to nested loop in python for example
LEFT JOIN LATERAL (
SELECT
count(*) AS count_prev_30_all,
count(DISTINCT r2.about_id) AS count_prev_30_distinct
FROM
reviews AS r2
WHERE
r2.updated_at::DATE BETWEEN w.week_date - INTERVAL '30 days' AND w.week_date
) AS counts ON TRUE
-- and another just for the (4); the code could be rewritten to use just
-- lateral join with a bit more effort
LEFT JOIN LATERAL
(
SELECT count(*) AS count_prev_30_distinct_at_least_3
FROM
(
SELECT
r3.about_id,
count(*) AS count
FROM reviews AS r3
WHERE r3.updated_at :: DATE BETWEEN w.week_date - INTERVAL '30 days' AND w.week_date
GROUP BY r3.about_id
) AS hlp
WHERE count >= 3
) AS count_prev_30_distinct_at_least_3 ON TRUE
ORDER BY week_date;
我有 table 条评论看起来像这样:
我正在尝试生成 returns 4 列的结果:
- 周,从星期一开始 (yyyy-mm-dd)
- 该周评论的电影数量
- 过去 30 天内评论的不同电影数量
- 在过去 30 天内有 3 条或更多评论的不同电影的计数
我已经完成了 1、2 和 3(我认为),但我不知道如何 return 4.
这个查询给出了 1、2 和 3:
WITH week_dates AS (
SELECT
date(date_trunc('week', f.updated_at::date)) AS week_date,
count(*) Movie_Reviews
FROM Reviews f
WHERE submitted = TRUE AND about_type = 'Movie'
GROUP BY week_date
)
SELECT
wd.*,
(SELECT
count(DISTINCT ff.about_id) Fresh_Reviews
FROM Reviews ff
WHERE ff.submitted = TRUE
AND ff.about_type = 'Movie' -- reviewed within last 30 days
AND ff.updated_at <= wd.week_date
AND ff.updated_at > wd.week_date - INTERVAL '30 days'
) Freshly_Reviewed
FROM week_dates wd
ORDER BY wd.week_date ASC
对return计算在过去 30 天内有 3 条或更多条评论的不同电影有帮助吗?
WITH
-- the reviews CTE is here just in place of your reviews table
reviews AS (
SELECT
*
FROM
(
VALUES
('2014-08-02', 'Movie', True, 'Modern Times'),
('2016-10-21', 'Movie', True, 'Enter the Matrix'),
('2016-10-22', 'Movie', True, 'Enter the Matrix'),
('2016-10-23', 'Movie', True, 'Enter the Matrix'),
('2016-11-01', 'Movie', True, 'Citizen Kane'),
('2016-11-02', 'Movie', True, 'Citizen Kane'),
('2016-11-02', 'Movie', True, 'Citizen Kane'),
('2016-11-10', 'Movie', True, 'Blade Runner'),
('2016-11-17', 'Album', False, 'The Chronic'),
('2018-01-02', 'Movie', True, 'Citizen Kane'),
('2018-02-01', 'Movie', True, 'Conquest of Paradise'),
('2018-02-15', 'Movie', True, 'Modern Times'),
('2018-02-27', 'Movie', True, 'Modern Times'),
('2018-03-01', 'Movie', True, 'Citizen Kane'),
('2018-03-01', 'Movie', True, 'Modern Times'),
('2018-03-02', 'Movie', True, 'Wolf from Wall Street'),
('2018-03-02', 'Album', False, 'The Chronic'),
('2018-03-03', 'Movie', True, 'Wolf from Wall Street'),
('2018-03-12', 'Movie', True, 'Into the Wild')
) AS t(updated_at, about_type, submitted, about_id)
WHERE
submitted = TRUE
AND about_type = 'Movie'
),
-- prepare weeks and their movie counts (1)
weeks AS (
SELECT
date_trunc('week', updated_at::DATE) AS week_date,
count(*) AS count_this_week
FROM reviews
GROUP BY week_date
)
SELECT
week_date,
count_this_week,
counts.*,
count_prev_30_distinct_at_least_3.*
FROM
weeks AS w
-- lateral join allows us to use the current row of the weeks table
-- basically as a nested subquery, but more efficiently
-- similar to nested loop in python for example
LEFT JOIN LATERAL (
SELECT
count(*) AS count_prev_30_all,
count(DISTINCT r2.about_id) AS count_prev_30_distinct
FROM
reviews AS r2
WHERE
r2.updated_at::DATE BETWEEN w.week_date - INTERVAL '30 days' AND w.week_date
) AS counts ON TRUE
-- and another just for the (4); the code could be rewritten to use just
-- lateral join with a bit more effort
LEFT JOIN LATERAL
(
SELECT count(*) AS count_prev_30_distinct_at_least_3
FROM
(
SELECT
r3.about_id,
count(*) AS count
FROM reviews AS r3
WHERE r3.updated_at :: DATE BETWEEN w.week_date - INTERVAL '30 days' AND w.week_date
GROUP BY r3.about_id
) AS hlp
WHERE count >= 3
) AS count_prev_30_distinct_at_least_3 ON TRUE
ORDER BY week_date;