在 SQL 中按条件对连续值进行分组和排名
Group and rank consecutive values by condition in SQL
我有一个 table mytable
,我想向其中添加另外两个列
我的 objective 是按 user_id
和 mobile_id
仅 分组,其中存在连续的值序列,其中 difftime > - 600
。该序列必须在 created_at
(时间戳)中连续,并给定一个等级,如果它是相同的用户和移动 ID 但出现 difftime
< - 600,则重新开始。每个单独的组将被分配一个增值。例如:
> mytable
created_at user_id mobile_id status difftime
1 2019-01-02 22:01:38 1227604 68409 finished \N
2 2019-01-03 04:08:29 1227604 68409 finished -366
3 2019-01-03 15:16:38 1227604 68409 timeout -668
4 2019-01-04 00:34:40 1227604 68409 failed -558
5 2019-01-04 00:27:37 1227605 68453 failed \N
6 2019-01-04 00:35:56 1227605 68453 finished -8
7 2019-01-04 01:39:52 1227605 68453 finished -63
8 2019-01-04 02:05:53 1227605 68453 timeout -26
9 2019-01-04 02:17:17 1227605 68453 timeout -11
10 2019-01-04 16:51:39 1227605 68453 timeout -874
将创建
的输出
> output
created_at user_id mobile_id status difftime group rank
1 2019-01-02 22:01:38 1227604 68409 finished \N NA NA
2 2019-01-03 04:08:29 1227604 68409 finished -366 1 1
3 2019-01-03 15:16:38 1227604 68409 timeout -668 NA NA
4 2019-01-04 00:34:40 1227604 68409 failed -558 2 1
5 2019-01-04 00:27:37 1227605 68453 failed \N NA NA
6 2019-01-04 00:35:56 1227605 68453 finished -8 3 1
7 2019-01-04 01:39:52 1227605 68453 finished -63 3 2
8 2019-01-04 02:05:53 1227605 68453 timeout -26 3 3
9 2019-01-04 02:17:17 1227605 68453 timeout -11 3 4
10 2019-01-04 16:51:39 1227605 68453 timeout -874 NA NA
当我只是尝试分配排名时,以下查询会引发错误:WHERE clause cannot contain aggregations, window functions or grouping operations
尽管我使用的是 Presto SQL,这里的任何 SQL 解决方案都有助于思考如何重构查询
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id, mobile_id ORDER BY created_at) as rank
from mytable
WHERE DATE_DIFF('minute', created_at, lag(created_at) OVER (PARTITION BY user_id, mobile_id ORDER BY user_id, created_at)) > -600
ORDER BY user_id, mobile_id, created_at
要识别组,请对 "invalid" 的值进行累加和。然后用dense_rank()
赋值
我不知道你的查询与你的问题有什么关系,但逻辑是这样的:
select t.*, grp,
(case when difftime > -600
then row_number() over (partition by user_id, mobile_id order by created_at)
end) as rank
from (select t.*,
dense_rank() over (partition by user_id, mobile_id order by grouping) as grp
from (select t.*,
sum(case when difftime > -600 then 1 else 0 end) over (partition by user_id, mobile_id order by created_at) as grouping
from t
) t
) t
我有一个 table mytable
,我想向其中添加另外两个列
我的 objective 是按 user_id
和 mobile_id
仅 分组,其中存在连续的值序列,其中 difftime > - 600
。该序列必须在 created_at
(时间戳)中连续,并给定一个等级,如果它是相同的用户和移动 ID 但出现 difftime
< - 600,则重新开始。每个单独的组将被分配一个增值。例如:
> mytable
created_at user_id mobile_id status difftime
1 2019-01-02 22:01:38 1227604 68409 finished \N
2 2019-01-03 04:08:29 1227604 68409 finished -366
3 2019-01-03 15:16:38 1227604 68409 timeout -668
4 2019-01-04 00:34:40 1227604 68409 failed -558
5 2019-01-04 00:27:37 1227605 68453 failed \N
6 2019-01-04 00:35:56 1227605 68453 finished -8
7 2019-01-04 01:39:52 1227605 68453 finished -63
8 2019-01-04 02:05:53 1227605 68453 timeout -26
9 2019-01-04 02:17:17 1227605 68453 timeout -11
10 2019-01-04 16:51:39 1227605 68453 timeout -874
将创建
的输出> output
created_at user_id mobile_id status difftime group rank
1 2019-01-02 22:01:38 1227604 68409 finished \N NA NA
2 2019-01-03 04:08:29 1227604 68409 finished -366 1 1
3 2019-01-03 15:16:38 1227604 68409 timeout -668 NA NA
4 2019-01-04 00:34:40 1227604 68409 failed -558 2 1
5 2019-01-04 00:27:37 1227605 68453 failed \N NA NA
6 2019-01-04 00:35:56 1227605 68453 finished -8 3 1
7 2019-01-04 01:39:52 1227605 68453 finished -63 3 2
8 2019-01-04 02:05:53 1227605 68453 timeout -26 3 3
9 2019-01-04 02:17:17 1227605 68453 timeout -11 3 4
10 2019-01-04 16:51:39 1227605 68453 timeout -874 NA NA
当我只是尝试分配排名时,以下查询会引发错误:WHERE clause cannot contain aggregations, window functions or grouping operations
尽管我使用的是 Presto SQL,这里的任何 SQL 解决方案都有助于思考如何重构查询
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id, mobile_id ORDER BY created_at) as rank
from mytable
WHERE DATE_DIFF('minute', created_at, lag(created_at) OVER (PARTITION BY user_id, mobile_id ORDER BY user_id, created_at)) > -600
ORDER BY user_id, mobile_id, created_at
要识别组,请对 "invalid" 的值进行累加和。然后用dense_rank()
赋值
我不知道你的查询与你的问题有什么关系,但逻辑是这样的:
select t.*, grp,
(case when difftime > -600
then row_number() over (partition by user_id, mobile_id order by created_at)
end) as rank
from (select t.*,
dense_rank() over (partition by user_id, mobile_id order by grouping) as grp
from (select t.*,
sum(case when difftime > -600 then 1 else 0 end) over (partition by user_id, mobile_id order by created_at) as grouping
from t
) t
) t