如何在 Teradata SQL 中的多个表的 JOIN ON 语句中确定 OR 顺序的优先级并考虑 NULL?
How to prioritize OR order and account for NULLs in JOIN ON statement for multiple tables in Teradata SQL?
我正在尝试在三个关键列上加入 TABLEONE (a)
和 TABLETWO (b)
,然后 return 描述 DESC
大约 10 亿行数据。
虽然其中一列是简单连接 (a.TS = b.UCDE
),但另外两列有各种规则,需要按特定顺序执行,以便它们能够正确匹配。
我无法解释这种行为,特别是当某些列具有 NULL 值时会发生什么。
规则是:
a.TS = b.UCDE
(总是)
- 如果
a.ICI IS NOT NULL
然后检查它是否匹配给定的 b.STM
1.
一个。如果找到匹配项,请转到 3。
b。如果未找到匹配项(即 b.STM IS NULL
),则该行 return DESC
。
c。如果 a.ICI IS NULL
然后加入 b.STM IS NULL
给定 1.
的行
a.S_ID
可以是 0000000000
、NULL
或类似 0000000300
的值,而 b.SRL
可以是 NULL
或value like 000300
(注意:a.S_ID
有10个字符而b.SRL
只有6个,所以join在a.S_ID
的右边6个字符)
一个。如果匹配
,则优先级应为 RIGHT(a.S_ID,6) = b.SRL
b。如果 a.S_ID = 0000000000
或 IS NULL
,这可以连接到 b.SRL IS NULL
,但实际上连接是多余的,因为 1. 和 2. 在这种情况下应该足够了。
SELECT a.TDATE, a.AMT, a.ISI, a.TS, b.UCDE, a.ICI, b.STM, a.S_ID,
CASE WHEN b.SRL1 IS NULL THEN '0000000000' ELSE b.SRL1 END AS SRL, a.RES,
-- Manually override certain conditions
CASE WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
ELSE b.DESC END AS DESC
FROM TABLEONE as a
FULL JOIN TABLETWO AS b
ON ( a.TS = b.UCDE
AND (a.ICI = b.STM OR (a.ICI IS NULL and b.STM IS NULL) OR b.STM IS NULL)
AND (RIGHT(a.S_ID,6) = RIGHT(SRL,6) OR a.S_ID IS NULL)
)
ORDER BY (CASE WHEN STM IS NULL THEN 2 ELSE 1 END)
我当前的代码工作不一致,在某些情况下,我从 OR b.STM IS NULL
或 OR a.S_ID IS NULL
中得到重复项。在其他情况下,它似乎工作正常。
我一直在修改它,但不一致的结果令人困惑,我不确定我做错了什么,或者是否有更好的方法来处理 NULL 条件?
编辑以添加示例(EXPECTED DESC
是我的预期结果):
TABLE A:
+----+------------+-----+-----+------+--------+------------+-----+---------------+
| # | TDATE | AMT | ISI | TS | ICI | S_ID | RES | EXPECTED DESC |
+----+------------+-----+-----+------+--------+------------+-----+---------------+
| 1 | 2019-09-01 | 94 | DC | 1001 | 1A | 0000000300 | PX | A |
| 2 | 2019-09-01 | 35 | DC | 1001 | 2B | 0000000300 | DL | B |
| 3 | 2019-09-01 | 40 | DC | 1001 | 2B | 0000000600 | JI | C |
| 4 | 2019-09-01 | 65 | DC | 1001 | 2B | <NULL> | WO | D |
| 5 | 2019-09-02 | 95 | AC | 1001 | 2B | 0000000000 | FK | D |
| 6 | 2019-09-03 | 10 | AC | 1001 | 3C | <NULL> | SL | E |
| 7 | 2019-09-04 | 8 | AC | 1001 | 3C | 0000000000 | FH | E |
| 8 | 2019-09-05 | 40 | DC | 1001 | 3C | 0000000600 | WO | E |
| 9 | 2019-09-06 | 65 | DC | 1001 | 4D | <NULL> | FK | F |
| 10 | 2019-09-07 | 95 | AC | 1001 | 4D | 0000000000 | SL | F |
| 11 | 2019-09-08 | 10 | AC | 1001 | 4D | 0000000600 | FH | F |
| 12 | 2019-09-09 | 8 | AC | 1001 | <NULL> | 0000000300 | WO | G |
| 13 | 2019-09-10 | 40 | DC | 1001 | <NULL> | 0000000500 | FK | H |
| 14 | 2019-09-11 | 65 | DC | 1001 | <NULL> | <NULL> | SL | I |
| 15 | 2019-09-12 | 95 | AC | 1001 | <NULL> | 0000000000 | FH | I |
+----+------------+-----+-----+------+--------+------------+-----+---------------+
TABLE B:
+------+--------+--------+------+
| UCDE | STM | SRL | DESC |
+------+--------+--------+------+
| 1001 | 1A | 000300 | A |
| 1001 | 2B | 000300 | B |
| 1001 | 2B | 000600 | C |
| 1001 | 2B | <NULL> | D |
| 1001 | 3C | <NULL> | E |
| 1001 | 4D | <NULL> | F |
| 1001 | <NULL> | 000300 | G |
| 1001 | <NULL> | 000500 | H |
| 1001 | <NULL> | <NULL> | I |
+------+--------+--------+------+
据我了解,您尝试加入 TABLEONE
和 TABLETWO
的条件如下:
a.TS = b.UCDE
a.ICI IS NOT NULL AND a.ICI = b.STM
和逻辑#3;否则加入 b.STM IS NULL
RIGHT(a.S_ID,6) = b.SRL
;否则,如果 a.S_ID = 0000000000 or IS NULL
,加入 b.SRL IS NULL
我采用了您指定的逻辑并重新处理了您的 JOIN
条件:
SELECT a.TDATE, a.AMT, a.ISI, a.TS, b.UCDE, a.ICI, b.STM, a.S_ID,
CASE WHEN b.SRL1 IS NULL THEN '0000000000' ELSE b.SRL1 END AS SRL, a.RES,
-- Manually override certain conditions
CASE WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
ELSE b.DESC END AS DESC
FROM TABLEONE AS a
FULL OUTER JOIN TABLETWO AS b ON a.TS = b.UCDE
AND (
(
a.ICI = b.STM AND -- a.ICI IS NOT NULL --> check for match on b.STM (NULL <> NULL)
(
(RIGHT(a.S_ID, 6) = b.SRL) OR -- Priority should be on RIGHT(a.S_ID,6) = b.SRL
(COALESCE(a.S_ID, 0000000000) = 0000000000 AND b.SRL IS NULL)
)
) OR
(a.ICI IS NULL AND b.STM IS NULL) -- a.ICI IS NULL then join on b.STM IS NULL
)
ORDER BY (CASE WHEN STM IS NULL THEN 2 ELSE 1 END)
我没有要测试的 TD
系统,但请尝试一下,让我知道结果如何。
此外,我不太遵循这条线 If it does not find a match (i.e. b.STM IS NULL), then return DESC for that row.
我认为这是您在 SELECT
中的 CASE
表达式中的逻辑,所以我没有触及那部分。
更新
根据我的回答中的查询,这是一个事实 table:
Case a.ICI b.STM "Match" Join rows?
------------------------------------------
1 NULL NULL NO YES
2 5E NULL NO NO
3 NULL 6B NO NO
4 5E 6B NO NO
5 5E 5E YES Logic #3 (a.S_ID = b.SRL?)
这是您正在寻找的逻辑吗?如果您获得了预期的行,那么您只需在 SELECT
中添加 COALESCE
或 CASE
语句即可显示正确的 DESC
值。
如果上面 table 中的逻辑不是您想要的,并且您没有得到预期的行,那么根据您将 b.STM IS NULL
作为包罗万象的评论,我认为您可以将这一行 (a.ICI IS NULL AND b.STM IS NULL)
修改为简单的 (b.STM IS NULL)
。这将导致连接适用于案例 #2。
我怀疑您是否可以使用没有多行匹配的单个连接获得预期的结果,必须使用像 ROW_NUMBER 这样的 OLAP 函数删除它(在大型 table 上可能会非常昂贵) .
尝试第二次连接到 tabletwo
(当然,然后您需要 COALESCE
用于 table 的每一列):
SELECT a.TDATE, a.AMT, a.ISI, a.TS,
Coalesce(b1.UCDE, b2.UCDE), a.ICI,
Coalesce(b1.STM, b2.STM), a.S_ID,
Coalesce(b1.SRL, b2.SRL, '0000000000') AS SRL,
a.RES,
-- Manually override certain conditions
CASE
WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
ELSE Coalesce(b1.DESC, b2.DESC)
END AS DESC_
FROM tableone AS a
LEFT JOIN tabletwo AS b1
ON a.TS = b1.UCDE
AND (a.ICI = b1.STM OR (a.ICI IS NULL AND b1.STM IS NULL))
AND (Coalesce(Right(a.S_ID,6), '000000') = Coalesce(b1.SRL, '000000'))
LEFT JOIN tabletwo AS b2
ON a.TS = b2.UCDE
AND (a.ICI = b2.STM OR (a.ICI IS NULL AND b2.STM IS NULL))
AND b1.UCDE IS NULL -- join only if no match based on SRL, yet
性能将主要取决于 tableone 的 PI 和 tabletwo 的行数。
我正在尝试在三个关键列上加入 TABLEONE (a)
和 TABLETWO (b)
,然后 return 描述 DESC
大约 10 亿行数据。
虽然其中一列是简单连接 (a.TS = b.UCDE
),但另外两列有各种规则,需要按特定顺序执行,以便它们能够正确匹配。
我无法解释这种行为,特别是当某些列具有 NULL 值时会发生什么。
规则是:
a.TS = b.UCDE
(总是)- 如果
a.ICI IS NOT NULL
然后检查它是否匹配给定的b.STM
1.
一个。如果找到匹配项,请转到 3。
b。如果未找到匹配项(即 b.STM IS NULL
),则该行 return DESC
。
c。如果 a.ICI IS NULL
然后加入 b.STM IS NULL
给定 1.
a.S_ID
可以是0000000000
、NULL
或类似0000000300
的值,而b.SRL
可以是NULL
或value like000300
(注意:a.S_ID
有10个字符而b.SRL
只有6个,所以join在a.S_ID
的右边6个字符)
一个。如果匹配
,则优先级应为RIGHT(a.S_ID,6) = b.SRL
b。如果 a.S_ID = 0000000000
或 IS NULL
,这可以连接到 b.SRL IS NULL
,但实际上连接是多余的,因为 1. 和 2. 在这种情况下应该足够了。
SELECT a.TDATE, a.AMT, a.ISI, a.TS, b.UCDE, a.ICI, b.STM, a.S_ID,
CASE WHEN b.SRL1 IS NULL THEN '0000000000' ELSE b.SRL1 END AS SRL, a.RES,
-- Manually override certain conditions
CASE WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
ELSE b.DESC END AS DESC
FROM TABLEONE as a
FULL JOIN TABLETWO AS b
ON ( a.TS = b.UCDE
AND (a.ICI = b.STM OR (a.ICI IS NULL and b.STM IS NULL) OR b.STM IS NULL)
AND (RIGHT(a.S_ID,6) = RIGHT(SRL,6) OR a.S_ID IS NULL)
)
ORDER BY (CASE WHEN STM IS NULL THEN 2 ELSE 1 END)
我当前的代码工作不一致,在某些情况下,我从 OR b.STM IS NULL
或 OR a.S_ID IS NULL
中得到重复项。在其他情况下,它似乎工作正常。
我一直在修改它,但不一致的结果令人困惑,我不确定我做错了什么,或者是否有更好的方法来处理 NULL 条件?
编辑以添加示例(EXPECTED DESC
是我的预期结果):
TABLE A:
+----+------------+-----+-----+------+--------+------------+-----+---------------+
| # | TDATE | AMT | ISI | TS | ICI | S_ID | RES | EXPECTED DESC |
+----+------------+-----+-----+------+--------+------------+-----+---------------+
| 1 | 2019-09-01 | 94 | DC | 1001 | 1A | 0000000300 | PX | A |
| 2 | 2019-09-01 | 35 | DC | 1001 | 2B | 0000000300 | DL | B |
| 3 | 2019-09-01 | 40 | DC | 1001 | 2B | 0000000600 | JI | C |
| 4 | 2019-09-01 | 65 | DC | 1001 | 2B | <NULL> | WO | D |
| 5 | 2019-09-02 | 95 | AC | 1001 | 2B | 0000000000 | FK | D |
| 6 | 2019-09-03 | 10 | AC | 1001 | 3C | <NULL> | SL | E |
| 7 | 2019-09-04 | 8 | AC | 1001 | 3C | 0000000000 | FH | E |
| 8 | 2019-09-05 | 40 | DC | 1001 | 3C | 0000000600 | WO | E |
| 9 | 2019-09-06 | 65 | DC | 1001 | 4D | <NULL> | FK | F |
| 10 | 2019-09-07 | 95 | AC | 1001 | 4D | 0000000000 | SL | F |
| 11 | 2019-09-08 | 10 | AC | 1001 | 4D | 0000000600 | FH | F |
| 12 | 2019-09-09 | 8 | AC | 1001 | <NULL> | 0000000300 | WO | G |
| 13 | 2019-09-10 | 40 | DC | 1001 | <NULL> | 0000000500 | FK | H |
| 14 | 2019-09-11 | 65 | DC | 1001 | <NULL> | <NULL> | SL | I |
| 15 | 2019-09-12 | 95 | AC | 1001 | <NULL> | 0000000000 | FH | I |
+----+------------+-----+-----+------+--------+------------+-----+---------------+
TABLE B:
+------+--------+--------+------+
| UCDE | STM | SRL | DESC |
+------+--------+--------+------+
| 1001 | 1A | 000300 | A |
| 1001 | 2B | 000300 | B |
| 1001 | 2B | 000600 | C |
| 1001 | 2B | <NULL> | D |
| 1001 | 3C | <NULL> | E |
| 1001 | 4D | <NULL> | F |
| 1001 | <NULL> | 000300 | G |
| 1001 | <NULL> | 000500 | H |
| 1001 | <NULL> | <NULL> | I |
+------+--------+--------+------+
据我了解,您尝试加入 TABLEONE
和 TABLETWO
的条件如下:
a.TS = b.UCDE
a.ICI IS NOT NULL AND a.ICI = b.STM
和逻辑#3;否则加入b.STM IS NULL
RIGHT(a.S_ID,6) = b.SRL
;否则,如果a.S_ID = 0000000000 or IS NULL
,加入b.SRL IS NULL
我采用了您指定的逻辑并重新处理了您的 JOIN
条件:
SELECT a.TDATE, a.AMT, a.ISI, a.TS, b.UCDE, a.ICI, b.STM, a.S_ID,
CASE WHEN b.SRL1 IS NULL THEN '0000000000' ELSE b.SRL1 END AS SRL, a.RES,
-- Manually override certain conditions
CASE WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
ELSE b.DESC END AS DESC
FROM TABLEONE AS a
FULL OUTER JOIN TABLETWO AS b ON a.TS = b.UCDE
AND (
(
a.ICI = b.STM AND -- a.ICI IS NOT NULL --> check for match on b.STM (NULL <> NULL)
(
(RIGHT(a.S_ID, 6) = b.SRL) OR -- Priority should be on RIGHT(a.S_ID,6) = b.SRL
(COALESCE(a.S_ID, 0000000000) = 0000000000 AND b.SRL IS NULL)
)
) OR
(a.ICI IS NULL AND b.STM IS NULL) -- a.ICI IS NULL then join on b.STM IS NULL
)
ORDER BY (CASE WHEN STM IS NULL THEN 2 ELSE 1 END)
我没有要测试的 TD
系统,但请尝试一下,让我知道结果如何。
此外,我不太遵循这条线 If it does not find a match (i.e. b.STM IS NULL), then return DESC for that row.
我认为这是您在 SELECT
中的 CASE
表达式中的逻辑,所以我没有触及那部分。
更新
根据我的回答中的查询,这是一个事实 table:
Case a.ICI b.STM "Match" Join rows?
------------------------------------------
1 NULL NULL NO YES
2 5E NULL NO NO
3 NULL 6B NO NO
4 5E 6B NO NO
5 5E 5E YES Logic #3 (a.S_ID = b.SRL?)
这是您正在寻找的逻辑吗?如果您获得了预期的行,那么您只需在 SELECT
中添加 COALESCE
或 CASE
语句即可显示正确的 DESC
值。
如果上面 table 中的逻辑不是您想要的,并且您没有得到预期的行,那么根据您将 b.STM IS NULL
作为包罗万象的评论,我认为您可以将这一行 (a.ICI IS NULL AND b.STM IS NULL)
修改为简单的 (b.STM IS NULL)
。这将导致连接适用于案例 #2。
我怀疑您是否可以使用没有多行匹配的单个连接获得预期的结果,必须使用像 ROW_NUMBER 这样的 OLAP 函数删除它(在大型 table 上可能会非常昂贵) .
尝试第二次连接到 tabletwo
(当然,然后您需要 COALESCE
用于 table 的每一列):
SELECT a.TDATE, a.AMT, a.ISI, a.TS,
Coalesce(b1.UCDE, b2.UCDE), a.ICI,
Coalesce(b1.STM, b2.STM), a.S_ID,
Coalesce(b1.SRL, b2.SRL, '0000000000') AS SRL,
a.RES,
-- Manually override certain conditions
CASE
WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
ELSE Coalesce(b1.DESC, b2.DESC)
END AS DESC_
FROM tableone AS a
LEFT JOIN tabletwo AS b1
ON a.TS = b1.UCDE
AND (a.ICI = b1.STM OR (a.ICI IS NULL AND b1.STM IS NULL))
AND (Coalesce(Right(a.S_ID,6), '000000') = Coalesce(b1.SRL, '000000'))
LEFT JOIN tabletwo AS b2
ON a.TS = b2.UCDE
AND (a.ICI = b2.STM OR (a.ICI IS NULL AND b2.STM IS NULL))
AND b1.UCDE IS NULL -- join only if no match based on SRL, yet
性能将主要取决于 tableone 的 PI 和 tabletwo 的行数。