如何跨列范围压缩 Hiveql 中的多个 where 语句：where icd_proc_cd_1='43644' or ... or icd_proc_cd_28='43644'

Question

有没有一种更优雅的方法可以在一系列具有索引名称的变量上压缩 where 语句？

例如代替：

create table table_cpt_43644
as select *
from master_table
where icd_proc_cd_1 = '43644' 
or icd_proc_cd_2 = '43644'
or icd_proc_cd_3 = '43644'
...
or icd_proc_cd_28 = '43644';

使用类似下面的东西（可惜行不通）：

create table table_cpt43644
as select *
from master_table
where icd_proc_cd_1-icd_proc_cd_28 = '43644';

Answer 1

使用 array_contains 稍微短一点：

where 
array_contains(
array( icd_proc_cd_1,icd_proc_cd_2,icd_proc_cd_3,icd_proc_cd_4,icd_proc_cd_5,icd_proc_cd_6,icd_proc_cd_7,icd_proc_cd_8,icd_proc_cd_9,icd_proc_cd_10,
       icd_proc_cd_11,icd_proc_cd_12,icd_proc_cd_13,icd_proc_cd_14, icd_proc_cd_15,icd_proc_cd_16,icd_proc_cd_17,icd_proc_cd_18,icd_proc_cd_19,icd_proc_cd_20, 
       icd_proc_cd_21,icd_proc_cd_22,icd_proc_cd_23,icd_proc_cd_24,icd_proc_cd_25,icd_proc_cd_26,icd_proc_cd_27,icd_proc_cd_28
     ), '43644')

如果您的 table 基于 CSV 文件，您可以重新定义 table DDL，使用 regexSerDe 和 select icd_proc_cd_1-icd_proc_cd_28 作为单个逗号分隔的列。那么您可以使用 array_contains(split(column_concatenated, ','),'43644') 使用更短的解决方案。在这种情况下使用 rlike 也是可能的。虽然第一个解决方案更灵活。

如何跨列范围压缩 Hiveql 中的多个 where 语句：where icd_proc_cd_1='43644' or ... or icd_proc_cd_28='43644'

How to condense multiple where statements in Hiveql across range of columns: where icd_proc_cd_1='43644' or ... or icd_proc_cd_28='43644'

sql

hive

where-clause

hiveql