每个用户第 n 次事件发生后 SQL- Return 行
SQL- Return rows after nth occurrence of event per user
我使用的是 postgreSQL 8.0,我有一个 table,带有 user_id、时间戳和 event_id。
如何 return 第 4 次出现 event_id = 每个用户的 someID 之后的行(或行)?
|---------------------|--------------------|------------------|
| user_id | timestamp | event_id |
|---------------------|--------------------|------------------|
| 1 | 2020-04-02 12:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 13:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 14:00 | 99 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 15:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 16:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 17:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 17:00 | 11 |
|---------------------|--------------------|------------------|
也就是说,如果 event_id = 11,我只想要上面 table 中的最后一行。
您可以使用 window 函数:
select *
from (
select t.*, row_number() over(partition by user_id, event_id order by timestamp) rn
from mytable t
) t
where rn > 4
这是一个从结果中删除行号的小技巧:
select (t).*
from (
select t, row_number() over(partition by user_id, event_id order by timestamp) rn
from mytable t
) x
where rn > 4
您可以使用累计计数。此版本包括第 4 次出现:
select t.*
from (select t.*,
count(*) filter (where event_id = 11) over (partition by user_id order by timestamp) as event_11_cnt
from t
) t
where event_11_cnt >= 4;
filter
长期以来一直是有效的 Postgres 语法,但您可以使用:
select t.*
from (select t.*,
sum( (event_id = 11)::int ) over (partition by user_id order by timestamp) as event_11_cnt
from t
) t
where event_11_cnt >= 4;
这个版本没有:
where event_11_cnt > 4 or (event_11_cnt = 4 and event_id <> 11)
另一种方法:
select t.*
from t
where t.timestamp > (select t2.timestamp
from t t2
where t2.user_id = t.user_id and
t2.event_id = 11
order by t2.timestamp
limit 1 offset 3
);
很抱歉询问这样一个旧版本的 Postgres,这是一个有效的答案:
WITH EventOrdered AS(
SELECT
EventTypeId
, UserId
, Timestamp
, ROW_NUMBER() OVER (PARTITION BY EventTypeId, UserId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ROW_NO
FROM Event),
FourthEvent AS (
SELECT DISTINCT
UserID
, FIRST_VALUE(TimeStamp) OVER (PARTITION BY UserId ORDER BY Timestamp) FirstFourthEventTimestamp
FROM EventOrdered
WHERE ROW_NO = 4)
SELECT e.*
FROM Event e
JOIN FourthEvent ffe
ON e.UserId = ffe.UserId
AND e.Timestamp > ffe.FirstFourthEventTimestamp
ORDER BY e.UserId, e.Timestamp
我使用的是 postgreSQL 8.0,我有一个 table,带有 user_id、时间戳和 event_id。
如何 return 第 4 次出现 event_id = 每个用户的 someID 之后的行(或行)?
|---------------------|--------------------|------------------|
| user_id | timestamp | event_id |
|---------------------|--------------------|------------------|
| 1 | 2020-04-02 12:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 13:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 14:00 | 99 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 15:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 16:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 17:00 | 11 |
|---------------------|--------------------|------------------|
| 2 | 2020-04-02 17:00 | 11 |
|---------------------|--------------------|------------------|
也就是说,如果 event_id = 11,我只想要上面 table 中的最后一行。
您可以使用 window 函数:
select *
from (
select t.*, row_number() over(partition by user_id, event_id order by timestamp) rn
from mytable t
) t
where rn > 4
这是一个从结果中删除行号的小技巧:
select (t).*
from (
select t, row_number() over(partition by user_id, event_id order by timestamp) rn
from mytable t
) x
where rn > 4
您可以使用累计计数。此版本包括第 4 次出现:
select t.*
from (select t.*,
count(*) filter (where event_id = 11) over (partition by user_id order by timestamp) as event_11_cnt
from t
) t
where event_11_cnt >= 4;
filter
长期以来一直是有效的 Postgres 语法,但您可以使用:
select t.*
from (select t.*,
sum( (event_id = 11)::int ) over (partition by user_id order by timestamp) as event_11_cnt
from t
) t
where event_11_cnt >= 4;
这个版本没有:
where event_11_cnt > 4 or (event_11_cnt = 4 and event_id <> 11)
另一种方法:
select t.*
from t
where t.timestamp > (select t2.timestamp
from t t2
where t2.user_id = t.user_id and
t2.event_id = 11
order by t2.timestamp
limit 1 offset 3
);
很抱歉询问这样一个旧版本的 Postgres,这是一个有效的答案:
WITH EventOrdered AS(
SELECT
EventTypeId
, UserId
, Timestamp
, ROW_NUMBER() OVER (PARTITION BY EventTypeId, UserId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ROW_NO
FROM Event),
FourthEvent AS (
SELECT DISTINCT
UserID
, FIRST_VALUE(TimeStamp) OVER (PARTITION BY UserId ORDER BY Timestamp) FirstFourthEventTimestamp
FROM EventOrdered
WHERE ROW_NO = 4)
SELECT e.*
FROM Event e
JOIN FourthEvent ffe
ON e.UserId = ffe.UserId
AND e.Timestamp > ffe.FirstFourthEventTimestamp
ORDER BY e.UserId, e.Timestamp