计算一个事件 "A" 前后的事件数,直到在大查询中遇到另一个事件 "A"?
Count number of events before and after a event "A" till another event "A" is encountered in Big query?
我有一个 table 包含日期、事件和用户。有一个名为 'A' 的事件。我想知道在 SQL Bigquery 中的事件 'A' 之前和之后特定事件发生了多少次。事件 A 可能会出现多次。但它应该只计算事件,直到它在前后条件下都遇到另一个事件 A。
例如,
User Date Events
123 2018-02-14 X.Y.A
123 2018-02-12 X.Y.B
134 2018-02-10 Y.Z.A
123 2018-02-11 A
123 2018-02-01 X.Y.Z
134 2018-02-05 X.Y.B
134 2018-02-04 A
123 2018-02-13 A
输出会是这样的。
User Event Before After
123 A 1 1
123 A 0 1
134 A 0 1
其他条件不变
这个问题是我上一个问题的延伸。
详情见。
我必须计算的事件包含一个特定的前缀。意味着我必须检查以( X.Y.then 某些事件名称)开头的事件。因此,X.Y.SomeEvent 是我必须为其设置计数器的事件。有什么建议吗?
这是一个更笼统的问题。使用可以使用与 lag()
和 lead()
相同的想法:
select userid,
(seqnum - lag(seqnum, 1, 0) over (partition by userid, order by date) - 1) as before,
(lead(seqnum, 1, cnt) over (partition by user_id order by date) - seqnum - 1) as after
from (select t.*,
row_number() over (partition by userid order by date) as seqnum,
count(*) over (partition by userid) as cnt
from t
where event like 'X.Y%' or event = 'A'
) t
where event = 'A';
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH grps AS (
SELECT user, dt, event,
COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
FROM `project.dataset.events`
)
SELECT dt, user, event, before, after
FROM (
SELECT dt, user, event,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
FROM grps
)
WHERE event = 'A'
-- ORDER BY user
您可以 test/play 使用上面示例中的虚拟数据,如下所示
#standardSQL
WITH `project.dataset.events` AS (
SELECT 123 user, '2018-02-14' dt, 'X.Y.A' event UNION ALL
SELECT 123, '2018-02-13', 'A' UNION ALL
SELECT 123, '2018-02-12', 'X.Y.B' UNION ALL
SELECT 123, '2018-02-11', 'A' UNION ALL
SELECT 123, '2018-02-01', 'X.Y.Z' UNION ALL
SELECT 134, '2018-02-10', 'Y.Z.A' UNION ALL
SELECT 134, '2018-02-05', 'X.Y.B' UNION ALL
SELECT 134, '2018-02-04', 'A'
), grps AS (
SELECT user, dt, event,
COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
FROM `project.dataset.events`
)
SELECT dt, user, event, before, after
FROM (
SELECT dt, user, event,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
FROM grps
)
WHERE event = 'A'
ORDER BY user
结果为
Row dt user event before after
1 2018-02-11 123 A 1 1
2 2018-02-13 123 A 1 1
3 2018-02-04 134 A 0 1
我有一个 table 包含日期、事件和用户。有一个名为 'A' 的事件。我想知道在 SQL Bigquery 中的事件 'A' 之前和之后特定事件发生了多少次。事件 A 可能会出现多次。但它应该只计算事件,直到它在前后条件下都遇到另一个事件 A。
例如,
User Date Events
123 2018-02-14 X.Y.A
123 2018-02-12 X.Y.B
134 2018-02-10 Y.Z.A
123 2018-02-11 A
123 2018-02-01 X.Y.Z
134 2018-02-05 X.Y.B
134 2018-02-04 A
123 2018-02-13 A
输出会是这样的。
User Event Before After
123 A 1 1
123 A 0 1
134 A 0 1
其他条件不变
这个问题是我上一个问题的延伸。
详情见
我必须计算的事件包含一个特定的前缀。意味着我必须检查以( X.Y.then 某些事件名称)开头的事件。因此,X.Y.SomeEvent 是我必须为其设置计数器的事件。有什么建议吗?
这是一个更笼统的问题。使用可以使用与 lag()
和 lead()
相同的想法:
select userid,
(seqnum - lag(seqnum, 1, 0) over (partition by userid, order by date) - 1) as before,
(lead(seqnum, 1, cnt) over (partition by user_id order by date) - seqnum - 1) as after
from (select t.*,
row_number() over (partition by userid order by date) as seqnum,
count(*) over (partition by userid) as cnt
from t
where event like 'X.Y%' or event = 'A'
) t
where event = 'A';
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH grps AS (
SELECT user, dt, event,
COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
FROM `project.dataset.events`
)
SELECT dt, user, event, before, after
FROM (
SELECT dt, user, event,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
FROM grps
)
WHERE event = 'A'
-- ORDER BY user
您可以 test/play 使用上面示例中的虚拟数据,如下所示
#standardSQL
WITH `project.dataset.events` AS (
SELECT 123 user, '2018-02-14' dt, 'X.Y.A' event UNION ALL
SELECT 123, '2018-02-13', 'A' UNION ALL
SELECT 123, '2018-02-12', 'X.Y.B' UNION ALL
SELECT 123, '2018-02-11', 'A' UNION ALL
SELECT 123, '2018-02-01', 'X.Y.Z' UNION ALL
SELECT 134, '2018-02-10', 'Y.Z.A' UNION ALL
SELECT 134, '2018-02-05', 'X.Y.B' UNION ALL
SELECT 134, '2018-02-04', 'A'
), grps AS (
SELECT user, dt, event,
COUNTIF(event = 'A') OVER(PARTITION BY user ORDER BY dt) grp
FROM `project.dataset.events`
)
SELECT dt, user, event, before, after
FROM (
SELECT dt, user, event,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING ) before,
COUNTIF(event LIKE 'X.Y.%') OVER(PARTITION BY user ORDER BY grp RANGE BETWEEN CURRENT ROW AND CURRENT ROW) after
FROM grps
)
WHERE event = 'A'
ORDER BY user
结果为
Row dt user event before after
1 2018-02-11 123 A 1 1
2 2018-02-13 123 A 1 1
3 2018-02-04 134 A 0 1