Redshift SQL Window 函数 frame_clause with days
Redshift SQL Window Function frame_clause with days
我正在尝试对 Redshift 中的数据集执行 window 函数,使用天数和前几行的间隔。
示例数据:
date ID score
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 3
3/5/2017 555 2
SQL window 最近 3 个分数的平均分数函数:
select
date,
id,
avg(score) over
(partition by id order by date rows
between preceding 3 and
current row) LAST_3_SCORES_AVG,
from DATASET
结果:
date ID LAST_3_SCORES_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2
问题是我想要最近 3 天(移动平均值)和不是的平均得分最后三项测试。我已经查看了 Redshift 和 Postgre 文档,但似乎找不到任何方法。
期望的结果:
date ID 3_DAY_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2.5
如有任何指示,我们将不胜感激。
您可以使用 lag()
并明确计算平均值。
select t.*,
(score +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 1) over (partition by id order by date)
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 2) over (partition by id order by date)
else 0
end)
)
) /
(1 +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end)
)
from dataset t;
在很多(或所有)情况下,可以使用以下方法代替 RANGE window 选项。
您可以为每个输入记录引入“过期”。到期记录会否定原始记录,因此当您汇总所有之前的记录时,只会考虑所需 范围 中的记录。
AVG 有点难,因为它没有直接对立面,所以我们需要将其视为 SUM/COUNT 并将两者取反。
SELECT id, date, running_avg_score
FROM
(
SELECT id, date, n,
SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
FROM
(
SELECT date, id, score, 1 as n
FROM DATASET
UNION ALL
-- expiry and negate
SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
FROM DATASET
)
) a
WHERE a.n = 1
我正在尝试对 Redshift 中的数据集执行 window 函数,使用天数和前几行的间隔。 示例数据:
date ID score
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 3
3/5/2017 555 2
SQL window 最近 3 个分数的平均分数函数:
select
date,
id,
avg(score) over
(partition by id order by date rows
between preceding 3 and
current row) LAST_3_SCORES_AVG,
from DATASET
结果:
date ID LAST_3_SCORES_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2
问题是我想要最近 3 天(移动平均值)和不是的平均得分最后三项测试。我已经查看了 Redshift 和 Postgre 文档,但似乎找不到任何方法。
期望的结果:
date ID 3_DAY_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2.5
如有任何指示,我们将不胜感激。
您可以使用 lag()
并明确计算平均值。
select t.*,
(score +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 1) over (partition by id order by date)
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 2) over (partition by id order by date)
else 0
end)
)
) /
(1 +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end)
)
from dataset t;
在很多(或所有)情况下,可以使用以下方法代替 RANGE window 选项。 您可以为每个输入记录引入“过期”。到期记录会否定原始记录,因此当您汇总所有之前的记录时,只会考虑所需 范围 中的记录。 AVG 有点难,因为它没有直接对立面,所以我们需要将其视为 SUM/COUNT 并将两者取反。
SELECT id, date, running_avg_score
FROM
(
SELECT id, date, n,
SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
FROM
(
SELECT date, id, score, 1 as n
FROM DATASET
UNION ALL
-- expiry and negate
SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
FROM DATASET
)
) a
WHERE a.n = 1