在 Group By 和 HAVING 之后左加入

LEFT JOIN after Group By and HAVING

我有一个观点 cnst_prsn_nm。我想检查共享相同 cnst_mstr_id 和相同姓氏但名字不同的记录。所以我在 Teradata SQL

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id  FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm

然后对于那些记录的 cnst_mstr_ids,我想检查另一个 table cnst_mstr 。 基本上我想检查 left join IS NULL

的位置
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL

所以我的查询基本上变成了

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id  FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL

但是有两个 WHERE 子句。 LEFT JOIN 也不能在 HAVING 之后直接出现。当有与分组关联的过滤器时,如何在 Group By 和 HAVING 子句之后进行左连接?

SQL 语句中的子句始终按特定顺序出现。首先是SELECT,然后是FROM,然后是JOINs,然后是WHERE,然后是GROUP BY,然后是HAVING。您不能偏离该顺序,也不需要(也不可能有)第二个 WHERE 子句。使您唯一的 WHERE 子句包括 all 您需要的条件。

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id  
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
  AND mstr_new.new_cnst_mstr_id IS NULL
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1

您的原始查询不正确(WHEREGROUP BY 之前)让我假设您是这个意思:

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
     arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
     ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1;

非匹配左连接等同于使用NOT EXISTS,所以你可以这样做:

SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
     arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
     ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 AND
       NOT EXISTS (SELECT 1
                   FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
                   WHERE prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
                  );

你的任务不用自连接也可以这样写:

SELECT *
FROM
 (
   SELECT TOP 20 -- why TOP?
      cnst_mstr_id, bz_cnst_prsn_last_nm
   FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
   GROUP BY cnst_mstr_id, bz_cnst_prsn_last_nm      -- same customer & name
   HAVING COUNT(DISTINCT bz_cnst_prsn_first_nm) > 1 -- different first_names
 ) AS prsn_nm
WHERE NOT EXISTS 
 (
   SELECT * 
   FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
   WHERE prsn_nm.cnst_mstr_id = mstr_new.new_cnst_mstr_id
 )

根据现有索引,这可能比自连接更快。

正如 Gordon 已经提到的,LEFT JOIN ... IS NULLNOT EXISTS 相同,在 Teradata 中后者通常更高效。