使用配置单元确定列中是否重复相同的值

Question

我有一个配置单元 table，它有一个名为 DATALIST.It 的列可以具有以下值

XYZ_OLD
XYZ_NEW
ABC_OLD
EFG_OLD
EFG_NEW
PQR_NEW

我需要创建一个输出来识别在column.In那些场景中不同时具有 _NEW 和 _OLD 的所有名称，它应该输出以下内容

Value  Reason
ABC    Missing NEW
PQR    Missing OLD
XYZ    Contains Both NEW and OLD
EFG    Contains both NEW and OLD

任何具有 SQL/HIVEQL 逻辑的 suggestion/help 将不胜感激。

Answer 1

我认为你可以做到：

select split(datalist, '_')[1],
       (case when sum(case when datalist like '%NEW' then 1 else 0 end) > 0 and
                  sum(case when datalist like '%OLD' then 1 else 0 end) > 0
             then 'BOTH'
             when sum(case when datalist like '%NEW' then 1 else 0 end) > 0 
             then 'NEW ONLY'
             else 'OLD ONLY'
         end)

from t
group by split(datalist, '_')[1];

使用配置单元确定列中是否重复相同的值

identify if same value is repeated in a column using hive

sql

hive

hiveql

apache-spark-sql