计算一个值在 Hive/SQL 中连续出现的次数
Count how many times a value appears continuously in Hive/SQL
我的 table 中有 3 列。我想计算,对于每个按时间排序的用户标识,value
连续等于 B
的次数。类似于具有相同值的最长子列表。例如下面的数据
time userid value
2016-01-01 1 A
2016-01-02 1 B
2016-01-03 1 B
2016-01-04 2 C
2016-01-05 2 B
2016-01-06 2 B
2016-01-07 2 B
2016-01-08 2 C
2016-01-09 2 B
会return
userid times
1 2
2 3
如果 Hive 中没有用户定义的函数,这甚至可能吗?我已经深入研究了 LAG
或 LEAD
,但找不到方法。 :(
select value
,userid
,max (times) as times
from (select value
,userid
,count (*) as times
from (select value
,userid
,row_number () over
(
partition by userid
order by time
) as rn
,row_number () over
(
partition by userid,value
order by time
) as rn_val
from t
-- where value = 'B'
) t
group by value
,userid
,rn - rn_val
) t
group by value
,userid
order by value
,userid
;
我的 table 中有 3 列。我想计算,对于每个按时间排序的用户标识,value
连续等于 B
的次数。类似于具有相同值的最长子列表。例如下面的数据
time userid value
2016-01-01 1 A
2016-01-02 1 B
2016-01-03 1 B
2016-01-04 2 C
2016-01-05 2 B
2016-01-06 2 B
2016-01-07 2 B
2016-01-08 2 C
2016-01-09 2 B
会return
userid times
1 2
2 3
如果 Hive 中没有用户定义的函数,这甚至可能吗?我已经深入研究了 LAG
或 LEAD
,但找不到方法。 :(
select value
,userid
,max (times) as times
from (select value
,userid
,count (*) as times
from (select value
,userid
,row_number () over
(
partition by userid
order by time
) as rn
,row_number () over
(
partition by userid,value
order by time
) as rn_val
from t
-- where value = 'B'
) t
group by value
,userid
,rn - rn_val
) t
group by value
,userid
order by value
,userid
;