计算 BigQuery 中时间戳之间的平均滞后
Calculating the average lag between timestamps in BigQuery
我的数据大致结构如下:
client_id | visit_number | session_start_time | hit_count
我目前正在使用:
SELECT client_id, visit_number, SUM(hit_count) OVER (PARTITION BY client_id ORDER BY visit_number),
session_start_time - LAG(session_start_time) OVER (PARTITION by client_id ORDER BY visit_number)
FROM session_table
理想情况下,我希望获得客户点击的滚动总和(这似乎工作正常)。连续会话之间的平均增量也很方便。希望我为当前会话计算一个增量的方法是正确的,但我不确定计算平均增量的合理方法。
一个想法是将上面的查询包装到一个 CTE 中,然后在另一个 window 函数中计算平均值,但我相信它可以在一个查询中完成。
如果你想要到本次会话的会话之间的平均时间,那么你可以用当前时间减去第一个时间再除以小于会话数的一个来计算:
SELECT client_id, visit_number,
SUM(hit_count) OVER (PARTITION BY client_id ORDER BY visit_number),
session_start_time - LAG(session_start_time) OVER (PARTITION by client_id ORDER BY visit_number) as delta,
(session_start_time -
MIN(session_start_time) OVER (PARTITION by client_id)
) / NULLIF(ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY session_start_time) - 1, 0) as avg_delta
FROM session_table;
我的数据大致结构如下:
client_id | visit_number | session_start_time | hit_count
我目前正在使用:
SELECT client_id, visit_number, SUM(hit_count) OVER (PARTITION BY client_id ORDER BY visit_number),
session_start_time - LAG(session_start_time) OVER (PARTITION by client_id ORDER BY visit_number)
FROM session_table
理想情况下,我希望获得客户点击的滚动总和(这似乎工作正常)。连续会话之间的平均增量也很方便。希望我为当前会话计算一个增量的方法是正确的,但我不确定计算平均增量的合理方法。
一个想法是将上面的查询包装到一个 CTE 中,然后在另一个 window 函数中计算平均值,但我相信它可以在一个查询中完成。
如果你想要到本次会话的会话之间的平均时间,那么你可以用当前时间减去第一个时间再除以小于会话数的一个来计算:
SELECT client_id, visit_number,
SUM(hit_count) OVER (PARTITION BY client_id ORDER BY visit_number),
session_start_time - LAG(session_start_time) OVER (PARTITION by client_id ORDER BY visit_number) as delta,
(session_start_time -
MIN(session_start_time) OVER (PARTITION by client_id)
) / NULLIF(ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY session_start_time) - 1, 0) as avg_delta
FROM session_table;