如何从 Hive 映射中获取唯一键的列表

How to get a list of only keys from Hive map

我在 Hive 的一个列中存储了一个映射,其中每一行的键可以不同。我怎样才能从每个地图中获取唯一键的列表?

函数 map_keys(Map) returns 一个包含输入映射键的无序数组。

例子,见代码注释:

    with mydata as (
    select 1 id, map('key11','val11','key12','val12','key13','val13') as mymap
    union all
    select 2 id, map('key21','val21','key22','val22','key13','val13') as mymap --Key13 also exist in first row
    )

select id, map_keys(d.mymap) keys
  from mydata d
; 

结果:

id  keys
1   ["key11","key12","key13"]
2   ["key21","key22","key13"]

如果您需要所有行的唯一键列表,展开数组并使用 collect_set 再次收集,它将 return 不同键的数组:

with mydata as (
select 1 id, map('key11','val11','key12','val12','key13','val13') as mymap
union all
select 2 id, map('key21','val21','key22','val22','key13','val13') as mymap --Key13 also exist in first row
)

select --id, 
       collect_set(key) as keys
  from mydata d
       lateral view outer explode(map_keys(d.mymap)) e as key
 --group by id   --without id in groupby you get the distinct list of keys in all rows
                 --with id in groupby you get list of map keys for each row
; 

结果:

["key11","key12","key13","key21","key22"]