Bigquery 中新安装用户的 Firebase 事件发生
Firebase Event Occurrences for New Installed Users in Bigquery
鉴于用户的安装日期,我想获取 Firebase (1) Event Occurrences 和 (2) Event Distinct Users' Count 对于我们在第 0 天到第 30 天的所有 200 多个 Firebase 事件。我在屏幕截图中模拟了下面(对于 D0-D30)的输出 table,但代码仅适用于 Day0-Day7。
(1) 事件发生
SELECT
event.name as event_name,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN event_count END) AS D0_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170802' AND _TABLE_SUFFIX < '20170803' THEN event_count END) AS D1_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170803' AND _TABLE_SUFFIX < '20170804' THEN event_count END) AS D2_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170804' AND _TABLE_SUFFIX < '20170805' THEN event_count END) AS D3_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170805' AND _TABLE_SUFFIX < '20170806' THEN event_count END) AS D4_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170806' AND _TABLE_SUFFIX < '20170807' THEN event_count END) AS D5_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170808' THEN event_count END) AS D6_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170808' AND _TABLE_SUFFIX < '20170809' THEN event_count END) AS D7_USERS
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000;
和
(2) 事件独立用户计数
SELECT
event.name as event_name,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN user_dim.app_info.app_instance_id END) AS D0_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170802' AND _TABLE_SUFFIX < '20170803' THEN user_dim.app_info.app_instance_id END) AS D1_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170803' AND _TABLE_SUFFIX < '20170804' THEN user_dim.app_info.app_instance_id END) AS D2_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170804' AND _TABLE_SUFFIX < '20170805' THEN user_dim.app_info.app_instance_id END) AS D3_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170805' AND _TABLE_SUFFIX < '20170806' THEN user_dim.app_info.app_instance_id END) AS D4_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170806' AND _TABLE_SUFFIX < '20170807' THEN user_dim.app_info.app_instance_id END) AS D5_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170808' THEN user_dim.app_info.app_instance_id END) AS D6_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170808' AND _TABLE_SUFFIX < '20170809' THEN user_dim.app_info.app_instance_id END) AS D7_USERS
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809'
AND user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY 1;
问题:
- 有没有更优化的写法?对于少量的列,它是有意义的 (D0-D7),但对于 D0-D30,我认为可能有更好的方法。非常感谢任何建议!
Mikhail 反馈后的最终答案:
我将两个查询组合在一个查询中,然后创建了一个数据透视表 table。请记住在执行前在 BigQuery 编辑器中 select "Standard SQL"。
SELECT
event.name AS event_name,
_TABLE_SUFFIX as day,
COUNT(1) as event_occurances,
COUNT(DISTINCT user_dim.app_info.app_instance_id) as event_unique_users
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170901' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY event_name, day
ORDER BY event_name;
附录注释:
2017 年 8 月 1 日的时间戳转换
- 纪元时间戳:1501545600
- 以毫秒为单位的时间戳:1501545600000
2017 年 8 月 2 日的时间戳转换
- 纪元时间戳:1501632000
- 以毫秒为单位的时间戳:1501632000000
Is there a more optimised way to write this?
1。优化它的一种方法是在下面重写
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN event_count END) AS D0_USERS
到这个
COUNTIF(_TABLE_SUFFIX = '20170801') AS D0_USERS
:o( 对于 D0-D30 的情况,你仍然需要将这一行写 31 次,但至少它不那么重了
2。另一种(正确的)方法是遵循最佳实践并将数据检索与数据可视化分开
所以你可以像下面这样做来检索所需的数据
#standardSQL
SELECT
event.name AS event_name,
_TABLE_SUFFIX as day,
COUNT(1) as users
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY event_name, day
然后您可以使用您喜欢的任何工具来调整结果
例如,在不离开 UI 的情况下使用 BigQuery Mate,您可以获得如下所示的枢轴
快速披露 - 我是 BigQuery Mate Chrome Extension
的作者
请注意:我没有调整或更改您的查询逻辑 - 我只是回答了您的具体问题 - 是否有更优化的方式来编写此内容?
鉴于用户的安装日期,我想获取 Firebase (1) Event Occurrences 和 (2) Event Distinct Users' Count 对于我们在第 0 天到第 30 天的所有 200 多个 Firebase 事件。我在屏幕截图中模拟了下面(对于 D0-D30)的输出 table,但代码仅适用于 Day0-Day7。
(1) 事件发生
SELECT
event.name as event_name,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN event_count END) AS D0_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170802' AND _TABLE_SUFFIX < '20170803' THEN event_count END) AS D1_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170803' AND _TABLE_SUFFIX < '20170804' THEN event_count END) AS D2_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170804' AND _TABLE_SUFFIX < '20170805' THEN event_count END) AS D3_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170805' AND _TABLE_SUFFIX < '20170806' THEN event_count END) AS D4_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170806' AND _TABLE_SUFFIX < '20170807' THEN event_count END) AS D5_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170808' THEN event_count END) AS D6_USERS,
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170808' AND _TABLE_SUFFIX < '20170809' THEN event_count END) AS D7_USERS
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000;
和
(2) 事件独立用户计数
SELECT
event.name as event_name,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN user_dim.app_info.app_instance_id END) AS D0_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170802' AND _TABLE_SUFFIX < '20170803' THEN user_dim.app_info.app_instance_id END) AS D1_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170803' AND _TABLE_SUFFIX < '20170804' THEN user_dim.app_info.app_instance_id END) AS D2_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170804' AND _TABLE_SUFFIX < '20170805' THEN user_dim.app_info.app_instance_id END) AS D3_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170805' AND _TABLE_SUFFIX < '20170806' THEN user_dim.app_info.app_instance_id END) AS D4_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170806' AND _TABLE_SUFFIX < '20170807' THEN user_dim.app_info.app_instance_id END) AS D5_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170807' AND _TABLE_SUFFIX < '20170808' THEN user_dim.app_info.app_instance_id END) AS D6_USERS,
COUNT(DISTINCT CASE WHEN _TABLE_SUFFIX >= '20170808' AND _TABLE_SUFFIX < '20170809' THEN user_dim.app_info.app_instance_id END) AS D7_USERS
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809'
AND user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY 1;
问题:
- 有没有更优化的写法?对于少量的列,它是有意义的 (D0-D7),但对于 D0-D30,我认为可能有更好的方法。非常感谢任何建议!
Mikhail 反馈后的最终答案:
我将两个查询组合在一个查询中,然后创建了一个数据透视表 table。请记住在执行前在 BigQuery 编辑器中 select "Standard SQL"。
SELECT
event.name AS event_name,
_TABLE_SUFFIX as day,
COUNT(1) as event_occurances,
COUNT(DISTINCT user_dim.app_info.app_instance_id) as event_unique_users
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170901' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY event_name, day
ORDER BY event_name;
附录注释:
2017 年 8 月 1 日的时间戳转换
- 纪元时间戳:1501545600
- 以毫秒为单位的时间戳:1501545600000
2017 年 8 月 2 日的时间戳转换
- 纪元时间戳:1501632000
- 以毫秒为单位的时间戳:1501632000000
Is there a more optimised way to write this?
1。优化它的一种方法是在下面重写
COUNT(CASE WHEN _TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170802' THEN event_count END) AS D0_USERS
到这个
COUNTIF(_TABLE_SUFFIX = '20170801') AS D0_USERS
:o( 对于 D0-D30 的情况,你仍然需要将这一行写 31 次,但至少它不那么重了
2。另一种(正确的)方法是遵循最佳实践并将数据检索与数据可视化分开
所以你可以像下面这样做来检索所需的数据
#standardSQL
SELECT
event.name AS event_name,
_TABLE_SUFFIX as day,
COUNT(1) as users
FROM `<<project-id>>.app_events_*`, UNNEST(event_dim) AS event
WHERE
_TABLE_SUFFIX >= '20170801' AND _TABLE_SUFFIX < '20170809' AND
user_dim.first_open_timestamp_micros BETWEEN 1501545600000000 AND 1501632000000000
GROUP BY event_name, day
然后您可以使用您喜欢的任何工具来调整结果
例如,在不离开 UI 的情况下使用 BigQuery Mate,您可以获得如下所示的枢轴
快速披露 - 我是 BigQuery Mate Chrome Extension
的作者请注意:我没有调整或更改您的查询逻辑 - 我只是回答了您的具体问题 - 是否有更优化的方式来编写此内容?