计算一个值在 Hive/SQL 中连续出现的次数

Count how many times a value appears continuously in Hive/SQL

我的 table 中有 3 列。我想计算,对于每个按时间排序的用户标识,value 连续等于 B 的次数。类似于具有相同值的最长子列表。例如下面的数据

time userid value 2016-01-01 1 A 2016-01-02 1 B 2016-01-03 1 B 2016-01-04 2 C 2016-01-05 2 B 2016-01-06 2 B 2016-01-07 2 B 2016-01-08 2 C 2016-01-09 2 B

会return

userid times 1 2 2 3

如果 Hive 中没有用户定义的函数,这甚至可能吗?我已经深入研究了 LAGLEAD,但找不到方法。 :(

select      value
           ,userid               
           ,max (times) as times


from       (select      value
                       ,userid
                       ,count (*)   as times

            from       (select  value
                               ,userid

                               ,row_number () over 
                                (
                                     partition by userid       
                                     order by     time
                                ) as rn

                               ,row_number () over 
                                (
                                    partition by userid,value 
                                    order by     time
                                ) as rn_val

                        from    t

                     -- where   value = 'B'
                        ) t

            group by    value
                       ,userid  
                       ,rn - rn_val 
            ) t

group by    value
           ,userid  

order by    value
           ,userid 
;