在配置单元中选择 minValue 及其行
Picking minValue and its row in hive
我必须在 2 小时的滑动日期 window 及其对应的日期值中选择最小值。例如
Create table stock(time string, cost float);
Insert into stock values("1990-01-01 8:00 AM",4.5);
Insert into stock values("1990-01-01 9:00 AM",3.2);
Insert into stock values("1990-01-01 10:00 AM",3.1);
Insert into stock values("1990-01-01 11:00 AM",5.5);
Insert into stock values("1990-01-02 8:00 AM",5.1);
Insert into stock values("1990-01-02 9:00 AM",2.2);
Insert into stock values("1990-01-02 10:00 AM",1.5);
Insert into stock values("1990-01-02 11:00 AM",6.5);
Insert into stock values("1990-01-03 8:00 AM",8.1);
Insert into stock values("1990-01-03 9:00 AM",3.2);
Insert into stock values("1990-01-03 10:00 AM",2.5);
Insert into stock values("1990-01-03 11:00 AM",4.5);
为此,我可以编写这样的查询
select min(cost) over(order by unix_timestamp(time) range between current row and 7200 following)
from stock
因此,从当前行向前看 2 小时(7200 秒)并选择最小值
第一行的最小值为 3.1,位于第三行 10:00 A.M。我通过此查询获得了正确的最小值,但我还需要最小值的相应日期值,在本例中,我想要“1990-01-01 10:00 AM”。我该如何选择?
谢谢,
拉吉
我觉得这是一道难题。一种方法是 join
找到值:
select s.*
from (select s.*,
min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
from stock s
) s join
stock smin
on smin.cost = min_cost and
unix_timestamp(smin.time) >= unix_timestamp(s.time) and
unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
这种方法的缺点是它可能会产生重复项。如果这是一个问题:
select s.*
from (select s.*, smin.time as min_time,
row_number() over (partition by s.time order by smin.time) as seqnum
from (select s.*,
min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
from stock s
) s join
stock smin
on smin.cost = min_cost and
unix_timestamp(smin.time) >= unix_timestamp(s.time) and
unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
) s
where seqnum = 1;
我必须在 2 小时的滑动日期 window 及其对应的日期值中选择最小值。例如
Create table stock(time string, cost float);
Insert into stock values("1990-01-01 8:00 AM",4.5);
Insert into stock values("1990-01-01 9:00 AM",3.2);
Insert into stock values("1990-01-01 10:00 AM",3.1);
Insert into stock values("1990-01-01 11:00 AM",5.5);
Insert into stock values("1990-01-02 8:00 AM",5.1);
Insert into stock values("1990-01-02 9:00 AM",2.2);
Insert into stock values("1990-01-02 10:00 AM",1.5);
Insert into stock values("1990-01-02 11:00 AM",6.5);
Insert into stock values("1990-01-03 8:00 AM",8.1);
Insert into stock values("1990-01-03 9:00 AM",3.2);
Insert into stock values("1990-01-03 10:00 AM",2.5);
Insert into stock values("1990-01-03 11:00 AM",4.5);
为此,我可以编写这样的查询
select min(cost) over(order by unix_timestamp(time) range between current row and 7200 following)
from stock
因此,从当前行向前看 2 小时(7200 秒)并选择最小值 第一行的最小值为 3.1,位于第三行 10:00 A.M。我通过此查询获得了正确的最小值,但我还需要最小值的相应日期值,在本例中,我想要“1990-01-01 10:00 AM”。我该如何选择?
谢谢, 拉吉
我觉得这是一道难题。一种方法是 join
找到值:
select s.*
from (select s.*,
min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
from stock s
) s join
stock smin
on smin.cost = min_cost and
unix_timestamp(smin.time) >= unix_timestamp(s.time) and
unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
这种方法的缺点是它可能会产生重复项。如果这是一个问题:
select s.*
from (select s.*, smin.time as min_time,
row_number() over (partition by s.time order by smin.time) as seqnum
from (select s.*,
min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
from stock s
) s join
stock smin
on smin.cost = min_cost and
unix_timestamp(smin.time) >= unix_timestamp(s.time) and
unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
) s
where seqnum = 1;