在 Group By 和 HAVING 之后左加入
LEFT JOIN after Group By and HAVING
我有一个观点 cnst_prsn_nm。我想检查共享相同 cnst_mstr_id 和相同姓氏但名字不同的记录。所以我在 Teradata SQL
SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
然后对于那些记录的 cnst_mstr_ids,我想检查另一个 table cnst_mstr 。
基本上我想检查 left join IS NULL
的位置
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL
所以我的查询基本上变成了
SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL
但是有两个 WHERE 子句。 LEFT JOIN 也不能在 HAVING 之后直接出现。当有与分组关联的过滤器时,如何在 Group By 和 HAVING 子句之后进行左连接?
SQL 语句中的子句始终按特定顺序出现。首先是SELECT
,然后是FROM
,然后是JOIN
s,然后是WHERE
,然后是GROUP BY
,然后是HAVING
。您不能偏离该顺序,也不需要(也不可能有)第二个 WHERE
子句。使您唯一的 WHERE
子句包括 all 您需要的条件。
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
AND mstr_new.new_cnst_mstr_id IS NULL
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
您的原始查询不正确(WHERE
在 GROUP BY
之前)让我假设您是这个意思:
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1;
非匹配左连接等同于使用NOT EXISTS
,所以你可以这样做:
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 AND
NOT EXISTS (SELECT 1
FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
WHERE prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
);
你的任务不用自连接也可以这样写:
SELECT *
FROM
(
SELECT TOP 20 -- why TOP?
cnst_mstr_id, bz_cnst_prsn_last_nm
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
GROUP BY cnst_mstr_id, bz_cnst_prsn_last_nm -- same customer & name
HAVING COUNT(DISTINCT bz_cnst_prsn_first_nm) > 1 -- different first_names
) AS prsn_nm
WHERE NOT EXISTS
(
SELECT *
FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
WHERE prsn_nm.cnst_mstr_id = mstr_new.new_cnst_mstr_id
)
根据现有索引,这可能比自连接更快。
正如 Gordon 已经提到的,LEFT JOIN ... IS NULL
与 NOT EXISTS
相同,在 Teradata 中后者通常更高效。
我有一个观点 cnst_prsn_nm。我想检查共享相同 cnst_mstr_id 和相同姓氏但名字不同的记录。所以我在 Teradata SQL
SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
然后对于那些记录的 cnst_mstr_ids,我想检查另一个 table cnst_mstr 。 基本上我想检查 left join IS NULL
的位置LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL
所以我的查询基本上变成了
SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL
但是有两个 WHERE 子句。 LEFT JOIN 也不能在 HAVING 之后直接出现。当有与分组关联的过滤器时,如何在 Group By 和 HAVING 子句之后进行左连接?
SQL 语句中的子句始终按特定顺序出现。首先是SELECT
,然后是FROM
,然后是JOIN
s,然后是WHERE
,然后是GROUP BY
,然后是HAVING
。您不能偏离该顺序,也不需要(也不可能有)第二个 WHERE
子句。使您唯一的 WHERE
子句包括 all 您需要的条件。
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
AND mstr_new.new_cnst_mstr_id IS NULL
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
您的原始查询不正确(WHERE
在 GROUP BY
之前)让我假设您是这个意思:
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1;
非匹配左连接等同于使用NOT EXISTS
,所以你可以这样做:
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 AND
NOT EXISTS (SELECT 1
FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
WHERE prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
);
你的任务不用自连接也可以这样写:
SELECT *
FROM
(
SELECT TOP 20 -- why TOP?
cnst_mstr_id, bz_cnst_prsn_last_nm
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
GROUP BY cnst_mstr_id, bz_cnst_prsn_last_nm -- same customer & name
HAVING COUNT(DISTINCT bz_cnst_prsn_first_nm) > 1 -- different first_names
) AS prsn_nm
WHERE NOT EXISTS
(
SELECT *
FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
WHERE prsn_nm.cnst_mstr_id = mstr_new.new_cnst_mstr_id
)
根据现有索引,这可能比自连接更快。
正如 Gordon 已经提到的,LEFT JOIN ... IS NULL
与 NOT EXISTS
相同,在 Teradata 中后者通常更高效。