在 SQL 中按条件对连续值进行分组和排名

Group and rank consecutive values by condition in SQL

我有一个 table mytable,我想向其中添加另外两个列

我的 objective 是按 user_idmobile_id 分组,其中存在连续的值序列,其中 difftime > - 600 。该序列必须在 created_at(时间戳)中连续,并给定一个等级,如果它是相同的用户和移动 ID 但出现 difftime < - 600,则重新开始。每个单独的组将被分配一个增值。例如:

> mytable
            created_at user_id mobile_id   status difftime
1  2019-01-02 22:01:38 1227604     68409 finished      \N
2  2019-01-03 04:08:29 1227604     68409 finished     -366
3  2019-01-03 15:16:38 1227604     68409  timeout     -668
4  2019-01-04 00:34:40 1227604     68409   failed     -558
5  2019-01-04 00:27:37 1227605     68453   failed      \N
6  2019-01-04 00:35:56 1227605     68453 finished       -8
7  2019-01-04 01:39:52 1227605     68453 finished      -63
8  2019-01-04 02:05:53 1227605     68453  timeout      -26
9  2019-01-04 02:17:17 1227605     68453  timeout      -11
10 2019-01-04 16:51:39 1227605     68453  timeout     -874

将创建

的输出
> output
            created_at user_id mobile_id   status difftime group rank
1  2019-01-02 22:01:38 1227604     68409 finished      \N    NA   NA
2  2019-01-03 04:08:29 1227604     68409 finished     -366     1    1
3  2019-01-03 15:16:38 1227604     68409  timeout     -668    NA   NA
4  2019-01-04 00:34:40 1227604     68409   failed     -558     2    1
5  2019-01-04 00:27:37 1227605     68453   failed      \N    NA   NA
6  2019-01-04 00:35:56 1227605     68453 finished       -8     3    1
7  2019-01-04 01:39:52 1227605     68453 finished      -63     3    2
8  2019-01-04 02:05:53 1227605     68453  timeout      -26     3    3
9  2019-01-04 02:17:17 1227605     68453  timeout      -11     3    4
10 2019-01-04 16:51:39 1227605     68453  timeout     -874    NA   NA

当我只是尝试分配排名时,以下查询会引发错误:WHERE clause cannot contain aggregations, window functions or grouping operations

尽管我使用的是 Presto SQL,这里的任何 SQL 解决方案都有助于思考如何重构查询

SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY user_id, mobile_id ORDER BY created_at) as rank
    from mytable
    WHERE DATE_DIFF('minute', created_at, lag(created_at) OVER (PARTITION BY user_id, mobile_id ORDER BY user_id, created_at)) > -600
    ORDER BY user_id, mobile_id, created_at

要识别组,请对 "invalid" 的值进行累加和。然后用dense_rank()赋值

我不知道你的查询与你的问题有什么关系,但逻辑是这样的:

select t.*, grp,
       (case when difftime > -600
             then row_number() over (partition by user_id, mobile_id order by created_at)
        end) as rank
from (select t.*,
             dense_rank() over (partition by user_id, mobile_id order by grouping) as grp
      from (select t.*,
                   sum(case when difftime > -600 then 1 else 0 end) over (partition by user_id, mobile_id order by created_at) as grouping
            from t
            ) t
     ) t