如何编写引用原始 table 的子查询进行聚合?

How to write a subquery which references the original table for an aggregation?

我有两个 tables 跟踪比赛的细节和每支球队在比赛中的表现。架构基本上是这样的:

Game
- id
- date [TIMESTAMPTZ]
- team_a_id
- team_b_id

TeamStats
- game_id
- team_id
- stat_a [INTEGER]
- stat_b [INTEGER]

对于每场比赛,我想总结一下每支球队在之前所有比赛中的表现。所需的输出看起来像这样,其中平均列取自给定 game_id:

之前日期的所有游戏
- game_id
- team_a_avg_stat_a
- team_a_avg_stat_b
- team_b_avg_stat_a
- team_b_avg_stat_b

我原以为我想要一个类似下面的查询,将游戏 table 加入查询给定时间范围内给定团队的平均统计数据:

-- Example for team_a, would repeat with another join for team_b
SELECT g.id, ats.avg_stat_a as team_a_avg_stat_a, ats.avg_stat_b as team_a_avg_stat_b
FROM game g
    INNER JOIN LATERAL (
        SELECT game_id, AVG(stat_a) AS avg_stat_a, AVG(stat_b) as avg_stat_b
        FROM teamstats its
            INNER JOIN game ig
                ON its.game_id = ig.id
        WHERE ig.date < g.date AND its.team_id = g.team_a_id
        GROUP BY its.game_id
        ) ats
        ON ats.game_id = g.id;

但是,当我尝试上述查询时,得到的结果为零。我本以为游戏中的每一行都有一个结果 table。

我最初的尝试实际上没有横向连接 - 但当我尝试这样做时,我收到一条错误消息,这让我走上了相关子查询的道路:

/* ERROR:  invalid reference to FROM-clause entry for table "g"
LINE 8:   WHERE ig.date < g.date AND its.team_id = g.team_a_id
                          ^
HINT:  There is an entry for table "g", but it cannot be referenced from this part of the query. */

我错过了什么?


此外,还有一个我一开始忘记提到的限制 - 我希望能够将平均值限制为仅考虑比赛日期的特定时间段(例如 90 天)内的日期。

嗯。 . .我认为 window 函数可以满足您的需求:

select g.*,
       avg(ts_a.stat_a) over (partition by ts_a.team_id order by g.date) as avg_a_a,
       avg(ts_a.stat_b) over (partition by ts_a.team_id order by g.date) as avg_a_b,
       avg(ts_b.stat_a) over (partition by ts_a.team_id order by g.date) as avg_b_a,
       avg(ts_b.stat_b) over (partition by ts_a.team_id order by g.date) as avg_b_b
from game g join
     teamstats ts_a
     on ts_a.game_id = g.id and ts_a.team_id =  g.team_a_id join
     teamstats ts_b
     on ts_b.game_id = g.id and ts_b.team_id =  g.team_b_id
     

我认为您可以使用 window 函数 - 但您需要一个行框以便仅考虑之前的游戏:

select g.id, 
    avg(ta.stats_a) over(
        partition by tsa.team_id 
        order by g.date rows between unbounded preceding and 1 preceding
    ) team_a_avg_stat_a,
    avg(ta.stats_b) over(
        partition by tsa.team_id 
        order by g.date rows between unbounded preceding and 1 preceding
    ) team_a_avg_stat_b,
    avg(tb.stats_a) over(
        partition by tsb.team_id 
        order by g.date rows between unbounded preceding and 1 preceding
    ) team_b_avg_stat_a,
    avg(ta.stats_b) over(
        partition by tsb.team_id 
        order by g.date rows between unbounded preceding and 1 preceding
    ) team_b_avg_stat_b
from game g
inner join teamstats tsa 
    on  tsa.game_id = g.game_id
    and tsa.team_id = g.team_a_id
inner join teamstats tsb
    on  tsb.game_id = g.game_id
    and tsb.team_id = g.team_b_id