从现有 Hive table 列创建多个列
Create multiple columns from existing Hive table columns
如何从现有配置单元创建多个列table。示例数据如下所示。
我的要求是仅在满足条件时从现有 table 创建 2 个新列。
code=1 时的 col1。 col2 当代码=2.
预期输出:
请帮助如何在 Hive 查询中实现它?
如果您将所需的值聚合到数组中,那么您可以分解并仅过滤具有匹配位置的值。
演示:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)
select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
collect_set(case when code=2 then col else null end) as col2
from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val
lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos
结果:
col1 col2
a b
a1 b1
如果数组大小不同,此方法将不起作用。
另一种方法 - 计算 row_number 并在 row_number 上完全连接,如果 col1 和 col2 具有不同数量的值(某些值将为空),这将起作用:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),
ordered as
(
select code, col, row_number() over(partition by code order by col) rn
from my_table where code in (1,2)
)
select c1.col as col1, c2.col as col2
from (select * from ordered where code=1) c1
full join
(select * from ordered where code=2) c2 on c1.rn = c2.rn
结果:
col1 col2
a b
a1 b1
如何从现有配置单元创建多个列table。示例数据如下所示。
我的要求是仅在满足条件时从现有 table 创建 2 个新列。 code=1 时的 col1。 col2 当代码=2.
预期输出:
请帮助如何在 Hive 查询中实现它?
如果您将所需的值聚合到数组中,那么您可以分解并仅过滤具有匹配位置的值。
演示:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)
select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
collect_set(case when code=2 then col else null end) as col2
from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val
lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos
结果:
col1 col2
a b
a1 b1
如果数组大小不同,此方法将不起作用。
另一种方法 - 计算 row_number 并在 row_number 上完全连接,如果 col1 和 col2 具有不同数量的值(某些值将为空),这将起作用:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),
ordered as
(
select code, col, row_number() over(partition by code order by col) rn
from my_table where code in (1,2)
)
select c1.col as col1, c2.col as col2
from (select * from ordered where code=1) c1
full join
(select * from ordered where code=2) c2 on c1.rn = c2.rn
结果:
col1 col2
a b
a1 b1