使用 last_value window 函数时 HIVE 中的语义异常错误
Semantic exception error in HIVE while using last_value window function
我有一个 table 包含以下数据:
dt device id count
2018-10-05 computer 7541185957382 6
2018-10-20 computer 7541185957382 3
2018-10-14 computer 7553187775734 6
2018-10-17 computer 7553187775734 10
2018-10-21 computer 7553187775734 2
2018-10-22 computer 7549187067178 5
2018-10-20 computer 7553187757256 3
2018-10-11 computer 7549187067178 10
我想为每个 id
获取最后一个和第一个 dt
。因此,我使用 window 函数 first_value 和 last_value 如下:
select id,last_value(dt) over (partition by id order by dt) last_dt
from table
order by id
;
但是我收到这个错误:
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Primitve type DATE not supported in Value Boundary expression
我无法诊断问题,如有任何帮助,我将不胜感激。
如果您在查询中添加 rows between 子句,那么您的查询将正常工作。
hive> select id,last_value(dt) over (partition by id order by dt
rows between unbounded preceding and unbounded following) last_dt
from table order by id;
结果:
+----------------+-------------+--+
| id | last_dt |
+----------------+-------------+--+
| 7541185957382 | 2018-10-20 |
| 7541185957382 | 2018-10-20 |
| 7549187067178 | 2018-10-22 |
| 7549187067178 | 2018-10-22 |
| 7553187757256 | 2018-10-20 |
| 7553187775734 | 2018-10-21 |
| 7553187775734 | 2018-10-21 |
| 7553187775734 | 2018-10-21 |
+----------------+-------------+--+
有 Jira 关于原始类型支持并在 Hive.2.1.0
中得到修复
更新:
对于不同的记录,您可以使用 ROW_NUMBER window 函数并从结果集中仅过滤掉 first row
。
hive> select id,last_dt from
(select id,last_value(dt) over (partition by id order by dt
rows between unbounded preceding and unbounded following) last_dt,
ROW_NUMBER() over (partition by id order by dt)rn
from so )t
where t.rn=1;
结果:
+----------------+-------------+--+
| id | dt |
+----------------+-------------+--+
| 7541185957382 | 2018-10-20 |
| 7553187757256 | 2018-10-20 |
| 7553187775734 | 2018-10-21 |
| 7549187067178 | 2018-10-22 |
+----------------+-------------+--+
我有一个 table 包含以下数据:
dt device id count
2018-10-05 computer 7541185957382 6
2018-10-20 computer 7541185957382 3
2018-10-14 computer 7553187775734 6
2018-10-17 computer 7553187775734 10
2018-10-21 computer 7553187775734 2
2018-10-22 computer 7549187067178 5
2018-10-20 computer 7553187757256 3
2018-10-11 computer 7549187067178 10
我想为每个 id
获取最后一个和第一个 dt
。因此,我使用 window 函数 first_value 和 last_value 如下:
select id,last_value(dt) over (partition by id order by dt) last_dt
from table
order by id
;
但是我收到这个错误:
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Primitve type DATE not supported in Value Boundary expression
我无法诊断问题,如有任何帮助,我将不胜感激。
如果您在查询中添加 rows between 子句,那么您的查询将正常工作。
hive> select id,last_value(dt) over (partition by id order by dt
rows between unbounded preceding and unbounded following) last_dt
from table order by id;
结果:
+----------------+-------------+--+
| id | last_dt |
+----------------+-------------+--+
| 7541185957382 | 2018-10-20 |
| 7541185957382 | 2018-10-20 |
| 7549187067178 | 2018-10-22 |
| 7549187067178 | 2018-10-22 |
| 7553187757256 | 2018-10-20 |
| 7553187775734 | 2018-10-21 |
| 7553187775734 | 2018-10-21 |
| 7553187775734 | 2018-10-21 |
+----------------+-------------+--+
有 Jira 关于原始类型支持并在 Hive.2.1.0
中得到修复更新:
对于不同的记录,您可以使用 ROW_NUMBER window 函数并从结果集中仅过滤掉 first row
。
hive> select id,last_dt from
(select id,last_value(dt) over (partition by id order by dt
rows between unbounded preceding and unbounded following) last_dt,
ROW_NUMBER() over (partition by id order by dt)rn
from so )t
where t.rn=1;
结果:
+----------------+-------------+--+
| id | dt |
+----------------+-------------+--+
| 7541185957382 | 2018-10-20 |
| 7553187757256 | 2018-10-20 |
| 7553187775734 | 2018-10-21 |
| 7549187067178 | 2018-10-22 |
+----------------+-------------+--+