Redshift SQL Window 函数 frame_clause with days

Redshift SQL Window Function frame_clause with days

我正在尝试对 Redshift 中的数据集执行 window 函数,使用天数和前几行的间隔。 示例数据:

date        ID      score
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     3
3/5/2017    555     2

SQL window 最近 3 个分数的平均分数函数:

select
      date,
      id,
      avg(score) over 
         (partition by id order by date rows 
              between preceding 3 and 
                      current row) LAST_3_SCORES_AVG,
from DATASET

结果:

date        ID      LAST_3_SCORES_AVG
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     2
3/5/2017    555     2

问题是我想要最近 3 天(移动平均值)和不是的平均得分最后三项测试。我已经查看了 Redshift 和 Postgre 文档,但似乎找不到任何方法。

期望的结果:

date        ID      3_DAY_AVG
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     2
3/5/2017    555     2.5

如有任何指示,我们将不胜感激。

您可以使用 lag() 并明确计算平均值。

select t.*,
       (score +
        (case when lag(date, 1) over (partition by id order by date) >=
                   date - interval '2 day'
              then lag(score, 1) over (partition by id order by date)
              else 0
         end) +
        (case when lag(date, 2) over (partition by id order by date) >=
                   date - interval '2 day'
              then lag(score, 2) over (partition by id order by date)
              else 0
         end)
        )
       ) /
       (1 +
        (case when lag(date, 1) over (partition by id order by date) >=
                   date - interval '2 day'
              then 1
              else 0
         end) +
        (case when lag(date, 2) over (partition by id order by date) >=
                   date - interval '2 day'
              then 1
              else 0
         end)
       )
from dataset t;

在很多(或所有)情况下,可以使用以下方法代替 RANGE window 选项。 您可以为每个输入记录引入“过期”。到期记录会否定原始记录,因此当您汇总所有之前的记录时,只会考虑所需 范围 中的记录。 AVG 有点难,因为它没有直接对立面,所以我们需要将其视为 SUM/COUNT 并将两者取反。

SELECT id, date, running_avg_score
FROM
(
    SELECT id, date, n,
            SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
            / NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
    FROM
    (
        SELECT date, id, score, 1 as n   
        FROM DATASET
        UNION ALL
        -- expiry and negate
        SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
        FROM DATASET
    )
) a
WHERE a.n = 1