查找三个工作日的平均值,按小时细分
Finding average value of three weekdays, broken down on hours
我在 Bigquery 中有一个像这样的起点-终点 table,包含工作日、日期、UTC time/hour 和行程数:
Origin Destination Day Date Time Count
NY Station Downtown Mon 02.09.2019 15 12
NY Station Downtown Mon 02.09.2019 16 10
City libry Eastside Mon 02.09.2019 17 10
NY Station Downtown Tue 03.09.2019 15 8
NY Station Downtown Tue 03.09.2019 16 5
City libry Eastside Tue 03.09.2019 17 5
NY Station Downtown Wed 04.09.2019 15 8
NY Station Downtown Wed 04.09.2019 16 10
City libry Eastside Wed 04.09.2019 17 11
我希望获得 Count 的平均值
- 每个起点-目的地对(纽约车站-市中心 和城市图书馆-东区)
- 周一至周三每个给定时间的平均值
输出应该类似于
Origin Destination Avg_Day Period Time Avg_Count
NY Station Downtown Mon-Wed Week1 (02.09.19-04.09.19) 15 9,33
NY Station Downtown Mon-Wed Week1 (02.09.19-04.09.19) 16 8,33
City libry Eastside Mon-Wed Week1 (02.09.19-04.09.19) 17 8,67
忽略 Avg_day 和 Period 列,因为它仅适用于 help/showing 我的日期和日期希望达到平均水平。换句话说,目的是了解正常工作日(在本例中定义为周一至周三)一天中特定时间每个起点-目的地对的平均计数。例如,NY Station-Downtown 对的时间 15 的平均计数是 9,33,通过取星期一、星期二和星期三 15 点的计数平均值计算(即 12、8 的平均值和 8).
我已经尝试了 CASE 和 WHERE SQL 查询的变体,但甚至还没有完全掌握如何为此进行查询的逻辑,因此没有必要发布任何查询。可能还必须创建一个临时 table。谁能帮我?非常感谢
以下适用于 BigQuery 标准 SQL
#standardSQL
select
Origin,
Destination,
'Mon-Wed' AS Avg_Day,
FORMAT('Week%i (%s-%s)', week, min_date, max_date) AS Period,
Time,
Avg_Count
from (
SELECT
Origin,
Destination,
'Mon-Wed' AS Avg_Day,
EXTRACT(WEEK FROM PARSE_DATE('%d.%m.%Y', date)) week,
MIN(date) AS min_date,
MAX(date) AS max_date,
Time,
ROUND(AVG(count), 2) AS Avg_Count
FROM `project.dataset.table`
WHERE day IN ('Mon', 'Tue', 'Wed')
GROUP BY Origin, Destination, Time, week
)
如果应用于您问题中的示例数据 - 输出为
Row Origin Destination Avg_Day Period Time Avg_Count
1 NY Station Downtown Mon-Wed Week35 (02.09.2019-04.09.2019) 15 9.33
2 NY Station Downtown Mon-Wed Week35 (02.09.2019-04.09.2019) 16 8.33
3 City libry Eastside Mon-Wed Week35 (02.09.2019-04.09.2019) 17 8.67
我在 Bigquery 中有一个像这样的起点-终点 table,包含工作日、日期、UTC time/hour 和行程数:
Origin Destination Day Date Time Count
NY Station Downtown Mon 02.09.2019 15 12
NY Station Downtown Mon 02.09.2019 16 10
City libry Eastside Mon 02.09.2019 17 10
NY Station Downtown Tue 03.09.2019 15 8
NY Station Downtown Tue 03.09.2019 16 5
City libry Eastside Tue 03.09.2019 17 5
NY Station Downtown Wed 04.09.2019 15 8
NY Station Downtown Wed 04.09.2019 16 10
City libry Eastside Wed 04.09.2019 17 11
我希望获得 Count 的平均值
- 每个起点-目的地对(纽约车站-市中心 和城市图书馆-东区)
- 周一至周三每个给定时间的平均值
输出应该类似于
Origin Destination Avg_Day Period Time Avg_Count
NY Station Downtown Mon-Wed Week1 (02.09.19-04.09.19) 15 9,33
NY Station Downtown Mon-Wed Week1 (02.09.19-04.09.19) 16 8,33
City libry Eastside Mon-Wed Week1 (02.09.19-04.09.19) 17 8,67
忽略 Avg_day 和 Period 列,因为它仅适用于 help/showing 我的日期和日期希望达到平均水平。换句话说,目的是了解正常工作日(在本例中定义为周一至周三)一天中特定时间每个起点-目的地对的平均计数。例如,NY Station-Downtown 对的时间 15 的平均计数是 9,33,通过取星期一、星期二和星期三 15 点的计数平均值计算(即 12、8 的平均值和 8).
我已经尝试了 CASE 和 WHERE SQL 查询的变体,但甚至还没有完全掌握如何为此进行查询的逻辑,因此没有必要发布任何查询。可能还必须创建一个临时 table。谁能帮我?非常感谢
以下适用于 BigQuery 标准 SQL
#standardSQL
select
Origin,
Destination,
'Mon-Wed' AS Avg_Day,
FORMAT('Week%i (%s-%s)', week, min_date, max_date) AS Period,
Time,
Avg_Count
from (
SELECT
Origin,
Destination,
'Mon-Wed' AS Avg_Day,
EXTRACT(WEEK FROM PARSE_DATE('%d.%m.%Y', date)) week,
MIN(date) AS min_date,
MAX(date) AS max_date,
Time,
ROUND(AVG(count), 2) AS Avg_Count
FROM `project.dataset.table`
WHERE day IN ('Mon', 'Tue', 'Wed')
GROUP BY Origin, Destination, Time, week
)
如果应用于您问题中的示例数据 - 输出为
Row Origin Destination Avg_Day Period Time Avg_Count
1 NY Station Downtown Mon-Wed Week35 (02.09.2019-04.09.2019) 15 9.33
2 NY Station Downtown Mon-Wed Week35 (02.09.2019-04.09.2019) 16 8.33
3 City libry Eastside Mon-Wed Week35 (02.09.2019-04.09.2019) 17 8.67