hive中的数组操作(数组相加)

Array operations (addition of arrays) in hive

我有一个配置单元 table,列 id(String),val(String) 为:

id,val
abc,{0|1|0}
abc,{0|1|1}
abc,{1|0|1|1}

我想添加按 id 列分组的 val 列。 预期结果是:

id,val
abc,{1|2|2|1}

这个结果可以通过并行添加数组得到。

我试过使用侧视图爆炸然后投射为 int 等等。 但无法获得预期的结果。 我知道使用 UDF 也是一种选择,但仅在 hive 中还有其他方法吗?

任何建议都会有所帮助。

谢谢

首先将 {} 替换为空格,将 split 替换为字符串,并使用 lateral viewposexplode 对数字求和同样的位置。

select id,pos,sum(split_val) as total
from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\|')) tbl as pos,split_val
group by id,pos

然后使用collect_list生成最终的数组。

select id,collect_list(total)
from (select id,pos,sum(split_val) as total
      from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\|')) tbl as pos,split_val
      group by id,pos
     ) t
group by id

这是一种可能的方法,还有更好的方法

select * from tbl1;

+----------+------------+--+
| tbl1.id  |  tbl1.val  |
+----------+------------+--+
| abc      | {0|1|0}    |
| abc      | {0|1|1}    |
| abc      | {1|0|1|1}  |
+----------+------------+--+

写在某处不用{}

insert overwrite directory '/user/cloudera/tbl2' 
row format delimited fields terminated by ','
select id, substr(val,2,length(val)-2) as val2 from tbl1

创建一个 table 以使用它

create external table tbl3(id string, val array<int>)
row format delimited
fields terminated by ','
collection items terminated by '|'
location '/user/cloudera/tbl2'

+----------+------------+--+
| tbl3.id  |  tbl3.val  |
+----------+------------+--+
| abc      | [0,1,0]    |
| abc      | [0,1,1]    |
| abc      | [1,0,1,1]  |
+----------+------------+--+

使用posexplode

select id, collect_list(val) 
from (
  select id, sum(c) as val 
    from (
      select id, i, c from tbl3 
      lateral view posexplode(val) v1 as i, c 
    ) tbl 
  group by id, i
  ) tbl2 
group by id

结果

+------+------------+--+
|  id  |    _c1     |
+------+------------+--+
| abc  | [1,2,2,1]  |
+------+------------+--+

蜂巢 table mytab:

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {0|1|0}    |
|   abc    | {0|1|1}    |
|   abc    | {1|0|1|1}  |
+----------+------------+

预期输出:

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {1|2|2|1}  |
+----------+------------+

使用的 Hive 查询:

select id,concat('{',concat_ws('|',(collect_list(cast(cast(expl_val_sum as int)as string)))),'}') as coll_expl_val 
from(
select id,index,sum(expl_val) as expl_val_sum
from mytab 
lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\|')) exp as index,expl_val
group by id,index)a
group by id;

1.First posexplode is used which explodes the array[String].
2.Then based on the index column the array values are added up parallelly.
3.Then cast as int is used to convert from decimal values to integer.
4.Then cast as String and then again converted to array[string] using collect_list.
5.Next the values of array are '|' delimited using concat_ws function.
6.Next concat function is used to append '{' and '}'.

感谢您的所有回复。