Hive 延迟 window 分区
Hive lag window partition
这是我的 table :
sensor_name, ext_value, int_value, growth
47ACXVMACSENS01, 238, 157, 1
47ACXVMACSENS01, 157, 256, 2
47ACXVMACSENS01, 895, 345, 3
47ACXVMACSENS01, 79, 861, 3
91DKCVMACSENS02, 904, 858, 1
91DKCVMACSENS02, 925, 588, 1
91DKCVMACSENS02, 15, 738, 1
91DKCVMACSENS02, 77, 38, 2
前3列(sensor_name、ext_value、int_value)是给定的数据,第4列是我想要的计算列,这个增长列是基于每组 sensor_name.
的列值 (ext_value, int_value)
增长列的结果计算如下:
对于每组sensor_name,每行的int_value与前一行的ext_value进行比较,如果没有前一行则其ext_value为0,如果[当前行的=27=]高于上一行的ext_value则增长值增加1。如果当前的int_value低于上一行的ext_value则增长值增长值与之前的增长值保持相同。
在上面的例子中,
for the very first row, 157 is compared with the previous row ext_value that doesn't exist so it's 0,
157 > 0 then growth value increase of 1 from 0.
on the 2nd row, 256 > 238 then growth = 1+1=2
on the 3rd row, 345 > 159 then growth = 2+1=3
on the 4th row, 861 < 895 then growth remains at the same previous value, so 3.
then the logic is re-applied to the second set of sensor_name :
1st row, 858 > 0 (because there is now previous row for this sensor_name) then growth = 1
2nd row, 588 < 904 then growth = 1
3rd row, 738 < 925 then growth = 1
4th row, 38 > 15 then growth = 1+1=2
我试过在 sensor_name 分区上使用滞后 window,但直到现在它都没有给我正确的结果。
我该如何解决这个问题?
使用滞后获取前一个 ext_value,计算增长标志并使用 运行 计数计算增长。
正如您在评论中所说,我添加了 rcv_time 列:
with your_table as ( --use your table instead of this
select stack(8,
'47ACXVMACSENS01', 238, 157, '2019-11-01 10:10:01',
'47ACXVMACSENS01', 157, 256, '2019-11-01 10:10:02',
'47ACXVMACSENS01', 895, 345, '2019-11-01 10:10:03',
'47ACXVMACSENS01', 79, 861, '2019-11-01 10:10:04',
'91DKCVMACSENS02', 904, 858, '2019-11-01 10:10:05',
'91DKCVMACSENS02', 925, 588, '2019-11-01 10:10:06',
'91DKCVMACSENS02', 15, 738, '2019-11-01 10:10:07',
'91DKCVMACSENS02', 77, 38, '2019-11-01 10:10:08'
) as (sensor_name, ext_value, int_value, rcv_time )
)
select sensor_name, ext_value, int_value,
count(case when int_value>prev_ext_value then true end) over(partition by sensor_name order by rcv_time) growth
from
(
select sensor_name, ext_value, int_value, rcv_time,
lag(ext_value,1,0) over(partition by sensor_name order by rcv_time) prev_ext_value
from your_table
)s;
结果:
47ACXVMACSENS01 238 157 1
47ACXVMACSENS01 157 256 2
47ACXVMACSENS01 895 345 3
47ACXVMACSENS01 79 861 3
91DKCVMACSENS02 904 858 1
91DKCVMACSENS02 925 588 1
91DKCVMACSENS02 15 738 1
91DKCVMACSENS02 77 38 2
产生的结果与你的例子完全一样
这是我的 table :
sensor_name, ext_value, int_value, growth
47ACXVMACSENS01, 238, 157, 1
47ACXVMACSENS01, 157, 256, 2
47ACXVMACSENS01, 895, 345, 3
47ACXVMACSENS01, 79, 861, 3
91DKCVMACSENS02, 904, 858, 1
91DKCVMACSENS02, 925, 588, 1
91DKCVMACSENS02, 15, 738, 1
91DKCVMACSENS02, 77, 38, 2
前3列(sensor_name、ext_value、int_value)是给定的数据,第4列是我想要的计算列,这个增长列是基于每组 sensor_name.
的列值 (ext_value, int_value)增长列的结果计算如下: 对于每组sensor_name,每行的int_value与前一行的ext_value进行比较,如果没有前一行则其ext_value为0,如果[当前行的=27=]高于上一行的ext_value则增长值增加1。如果当前的int_value低于上一行的ext_value则增长值增长值与之前的增长值保持相同。
在上面的例子中,
for the very first row, 157 is compared with the previous row ext_value that doesn't exist so it's 0,
157 > 0 then growth value increase of 1 from 0.
on the 2nd row, 256 > 238 then growth = 1+1=2
on the 3rd row, 345 > 159 then growth = 2+1=3
on the 4th row, 861 < 895 then growth remains at the same previous value, so 3.
then the logic is re-applied to the second set of sensor_name :
1st row, 858 > 0 (because there is now previous row for this sensor_name) then growth = 1
2nd row, 588 < 904 then growth = 1
3rd row, 738 < 925 then growth = 1
4th row, 38 > 15 then growth = 1+1=2
我试过在 sensor_name 分区上使用滞后 window,但直到现在它都没有给我正确的结果。
我该如何解决这个问题?
使用滞后获取前一个 ext_value,计算增长标志并使用 运行 计数计算增长。 正如您在评论中所说,我添加了 rcv_time 列:
with your_table as ( --use your table instead of this
select stack(8,
'47ACXVMACSENS01', 238, 157, '2019-11-01 10:10:01',
'47ACXVMACSENS01', 157, 256, '2019-11-01 10:10:02',
'47ACXVMACSENS01', 895, 345, '2019-11-01 10:10:03',
'47ACXVMACSENS01', 79, 861, '2019-11-01 10:10:04',
'91DKCVMACSENS02', 904, 858, '2019-11-01 10:10:05',
'91DKCVMACSENS02', 925, 588, '2019-11-01 10:10:06',
'91DKCVMACSENS02', 15, 738, '2019-11-01 10:10:07',
'91DKCVMACSENS02', 77, 38, '2019-11-01 10:10:08'
) as (sensor_name, ext_value, int_value, rcv_time )
)
select sensor_name, ext_value, int_value,
count(case when int_value>prev_ext_value then true end) over(partition by sensor_name order by rcv_time) growth
from
(
select sensor_name, ext_value, int_value, rcv_time,
lag(ext_value,1,0) over(partition by sensor_name order by rcv_time) prev_ext_value
from your_table
)s;
结果:
47ACXVMACSENS01 238 157 1
47ACXVMACSENS01 157 256 2
47ACXVMACSENS01 895 345 3
47ACXVMACSENS01 79 861 3
91DKCVMACSENS02 904 858 1
91DKCVMACSENS02 925 588 1
91DKCVMACSENS02 15 738 1
91DKCVMACSENS02 77 38 2
产生的结果与你的例子完全一样