创建分区时在 framing 子句 window 中添加分组
Adding grouping in framing clause window while creating partitions
以 Google (MBL Data) 上托管的数据集为例,这就是我要做的事情 - 获取给定场地的最后 3 周得分 运行。
我的聚合数据集看起来像这样,没有 strikes_3wk 列 -
strikes_3wk 列的逻辑是按 venueName 对聚合数据集进行分区,按 YearWeek 列排序,然后获取最近 3 周的聚合罢工数据。
这是我到目前为止编写的查询。我看到窗口函数是我需要修改逻辑的地方。那么,有没有办法在窗口函数中添加分组?有没有其他方法可以做到这一点?
在图像中我添加了一个新列 'expected',显示了两周的值。
select inr.*
,sum(inr.strikes) over (Venue_Week rows between current row and 2 following) as strikes_3wk
from
(
select seasonType
,gameStatus
,homeTeamName
,awayTeamName
,venueName
,CAST(
CONCAT(
CAST(EXTRACT(YEAR FROM createdAt) as string)
,CAST(EXTRACT(WEEK(Monday) FROM createdAt) as string)
) as INT64)
as YearWeek
,sum(homeFinalRuns) as homeFinalRuns
,sum(strikes) as strikes
from `bigquery-public-data.baseball.games_wide`
where createdAt is not null
group by seasonType
,gameStatus
,homeTeamName
,awayTeamName
,venueName
,YearWeek
)inr
window Venue_Week as (
partition by inr.venueName
order by inr.YearWeek desc
)
所以您正在寻找每个场地的罢工,而不管是谁干的,对吗?
可能是这样的:
SELECT INR.*, STATS.strikes_3wk
FROM `bigquery-public-data.baseball.games_wide` INR
LEFT JOIN (
SELECT venueName, SUM(strikes) as strikes_3wk
FROM `bigquery-public-data.baseball.games_wide` INR2
WHERE YearWeek IN (
SELECT TOP 3 YearWeek
FROM `bigquery-public-data.baseball.games_wide`
WHERE venueName = INR2.venueName
ORDER BY YearWeek DESC
)
GROUP BY venueName
) STATS
ON INR.venueName = STATS.venueName
以 Google (MBL Data) 上托管的数据集为例,这就是我要做的事情 - 获取给定场地的最后 3 周得分 运行。
我的聚合数据集看起来像这样,没有 strikes_3wk 列 -
strikes_3wk 列的逻辑是按 venueName 对聚合数据集进行分区,按 YearWeek 列排序,然后获取最近 3 周的聚合罢工数据。
这是我到目前为止编写的查询。我看到窗口函数是我需要修改逻辑的地方。那么,有没有办法在窗口函数中添加分组?有没有其他方法可以做到这一点?
在图像中我添加了一个新列 'expected',显示了两周的值。
select inr.*
,sum(inr.strikes) over (Venue_Week rows between current row and 2 following) as strikes_3wk
from
(
select seasonType
,gameStatus
,homeTeamName
,awayTeamName
,venueName
,CAST(
CONCAT(
CAST(EXTRACT(YEAR FROM createdAt) as string)
,CAST(EXTRACT(WEEK(Monday) FROM createdAt) as string)
) as INT64)
as YearWeek
,sum(homeFinalRuns) as homeFinalRuns
,sum(strikes) as strikes
from `bigquery-public-data.baseball.games_wide`
where createdAt is not null
group by seasonType
,gameStatus
,homeTeamName
,awayTeamName
,venueName
,YearWeek
)inr
window Venue_Week as (
partition by inr.venueName
order by inr.YearWeek desc
)
所以您正在寻找每个场地的罢工,而不管是谁干的,对吗?
可能是这样的:
SELECT INR.*, STATS.strikes_3wk
FROM `bigquery-public-data.baseball.games_wide` INR
LEFT JOIN (
SELECT venueName, SUM(strikes) as strikes_3wk
FROM `bigquery-public-data.baseball.games_wide` INR2
WHERE YearWeek IN (
SELECT TOP 3 YearWeek
FROM `bigquery-public-data.baseball.games_wide`
WHERE venueName = INR2.venueName
ORDER BY YearWeek DESC
)
GROUP BY venueName
) STATS
ON INR.venueName = STATS.venueName