我想在 table A 中提取一些在 table B 中没有条目的列。如何在 Hive 中实现？

Question

我想在 table (A) 中提取一些在 table (B) 中没有条目的列。我怎样才能在蜂巢中实现这一目标？我正在处理一个查询（如下），但目前无法正常工作，请帮忙。

加入列：prd_raw_sf.sf_opportunity_dn 中的 product_name 到 prd_raw_sf.sf_product_pcu_mapping 中的 SFDC_PRODUCT_NAME

select *
FROM prd_raw_sf.sf_opportunity_dn  
JOIN prd_raw_sf.sf_si_accounts_mapping ON prd_raw_sf.sf_opportunity_dn.account_name = prd_raw_sf.sf_si_accounts_mapping.sfdc_account_name
WHERE prd_raw_sf.sf_opportunity_dn.account_name not in (select * from prd_raw_sf.sf_si_accounts_mapping);

Answer 1

我推荐使用 not exists。 join 似乎没有必要：

select o.*
from prd_raw_sf.sf_opportunity_dn o 
where not exists (select 1
                  from prd_raw_sf.sf_si_accounts_mapping a
                  where o.account_name = a.account_name
                 );

Answer 2

这里想到了 left join 反模式：

select o.*
from prd_raw_sf.sf_opportunity_dn o
left prd_raw_sf.join sf_si_accounts_mapping m on o.account_name = m.sfdc_account_name
where m.sfdc_account_name is null

查询尝试 join sf_opportunity_dn 中的每条记录 sf_si_accounts_mapping，然后 where 子句仅过滤无法加入。

使用以下索引，这应该是一个有效的解决方案：

prd_raw_sf.sf_opportunity_dn(account_name )
prd_raw_sf.join sf_si_accounts_mapping(sfdc_account_name)

注意：table 别名可使查询更短且更易于理解。我已将它们添加到您的查询中，我建议您始终使用它们。

Answer 3

您可以使用 Left join 和 Left semi join

左连接方法：

select a.*
FROM prd_raw_sf.sf_opportunity_dn  as a
LEFT JOIN prd_raw_sf.sf_si_accounts_mapping as b 
      ON a.account_name = b.sfdc_account_name
WHERE b.sfdc_account_name is Null;

左半连接：

select a.*
    FROM prd_raw_sf.sf_opportunity_dn  as a
    LEFT SEMI JOIN prd_raw_sf.sf_si_accounts_mapping as b 
          ON a.account_name = b.sfdc_account_name

与左连接相比，性能明智的左半连接更好，因为它只检查在第二个 table 中找到第一个匹配记录并跳过特定键的剩余匹配项

我想在 table A 中提取一些在 table B 中没有条目的列。如何在 Hive 中实现？

I want to extract some columns in a table A that do not have an entry in table B. How can I achieve that in Hive?

sql

hive

bigdata

hiveql