如何在 Hive 中用多子查询重写 sql

How to rewrite sql with multi-subqueries in Hive

这里是一个 SQL,带有 GreenPlum 的多子查询。不幸的是,我必须将 SQL 迁移到 Hive,我不知道如何处理 WHERE 子句中的这些子查询。

select 
    t.ckid , t.prod_id , t.supp_num , t.wljhdh , 
    sum(t.sssl) as zmkc , max(t.dj) as dj
from 
    %s t
where
    exists (select 1 
            from dw_stage.wms_c_wlsjd w 
            where w.lydjh = t.wljhdh and w.lzztflag='上架确认'
              and (ckid , kqid) in (select ckid , kqid 
                                    from dw_stage.jcxx_kqxx 
                                    where kqytsxid in ('2','3'))
        )
        and (t.ckid,t.supp_num)  in (select cgck_stock_id,vndr_code from madfrog.cfg_vendor_dist where status=1 and send_method=2 and upper(purch_warehouse_type)='F')
        and supp_num not in (select distinct vndr_code as supp_no from madfrog.cfg_vendor_dist where status=1 and send_method in (4,5))
    group by t.ckid , t.prod_id , t.supp_num , t.wljhdh

感谢您的提示。

您需要将 subqueryin clause 转换为

Left Outer Join

关注结构:

select <cols list> 
from <tabname> t
left outer join dw_stage.wms_c_wlsjd w 
 on w.lydjh = t.wljhdh 
where w.lzztflag='上架确认'

((t.ckid,t.supp_num)  in (select .. )

supp_num not in (select distinct vndr_code as supp_no 

需要重写为外连接。

您可以在我对其他问题的回答中找到有关使用外部联接的更多信息:Hive command to execute NOT IN clause