如何在 Teradata SQL 中的多个表的 JOIN ON 语句中确定 OR 顺序的优先级并考虑 NULL?

How to prioritize OR order and account for NULLs in JOIN ON statement for multiple tables in Teradata SQL?

我正在尝试在三个关键列上加入 TABLEONE (a)TABLETWO (b),然后 return 描述 DESC 大约 10 亿行数据。

虽然其中一列是简单连接 (a.TS = b.UCDE),但另外两列有各种规则,需要按特定顺序执行,以便它们能够正确匹配。

我无法解释这种行为,特别是当某些列具有 NULL 值时会发生什么。

规则是:

  1. a.TS = b.UCDE(总是)
  2. 如果 a.ICI IS NOT NULL 然后检查它是否匹配给定的 b.STM 1.

一个。如果找到匹配项,请转到 3。

b。如果未找到匹配项(即 b.STM IS NULL),则该行 return DESC

c。如果 a.ICI IS NULL 然后加入 b.STM IS NULL 给定 1.

的行
  1. a.S_ID 可以是 0000000000NULL 或类似 0000000300 的值,而 b.SRL 可以是 NULL 或value like 000300(注意:a.S_ID有10个字符而b.SRL只有6个,所以join在a.S_ID的右边6个字符)

一个。如果匹配

,则优先级应为 RIGHT(a.S_ID,6) = b.SRL

b。如果 a.S_ID = 0000000000IS NULL,这可以连接到 b.SRL IS NULL,但实际上连接是多余的,因为 1. 和 2. 在这种情况下应该足够了。

SELECT a.TDATE, a.AMT, a.ISI, a.TS, b.UCDE, a.ICI, b.STM, a.S_ID,
CASE WHEN b.SRL1 IS NULL THEN '0000000000' ELSE b.SRL1 END AS SRL, a.RES,
-- Manually override certain conditions
CASE WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
     WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
     ELSE b.DESC END AS DESC
FROM TABLEONE as a
FULL JOIN TABLETWO AS b
    ON  ( a.TS = b.UCDE
        AND (a.ICI = b.STM OR (a.ICI IS NULL and b.STM IS NULL) OR b.STM IS NULL)
        AND (RIGHT(a.S_ID,6) = RIGHT(SRL,6) OR a.S_ID IS NULL)
        )
ORDER BY (CASE WHEN STM IS NULL THEN 2 ELSE 1 END)

我当前的代码工作不一致,在某些情况下,我从 OR b.STM IS NULLOR a.S_ID IS NULL 中得到重复项。在其他情况下,它似乎工作正常。

我一直在修改它,但不一致的结果令人困惑,我不确定我做错了什么,或者是否有更好的方法来处理 NULL 条件?


编辑以添加示例EXPECTED DESC 是我的预期结果):

TABLE A:
+----+------------+-----+-----+------+--------+------------+-----+---------------+
| #  |   TDATE    | AMT | ISI |  TS  |  ICI   |    S_ID    | RES | EXPECTED DESC |
+----+------------+-----+-----+------+--------+------------+-----+---------------+
|  1 | 2019-09-01 |  94 | DC  | 1001 | 1A     | 0000000300 | PX  | A             |
|  2 | 2019-09-01 |  35 | DC  | 1001 | 2B     | 0000000300 | DL  | B             |
|  3 | 2019-09-01 |  40 | DC  | 1001 | 2B     | 0000000600 | JI  | C             |
|  4 | 2019-09-01 |  65 | DC  | 1001 | 2B     | <NULL>     | WO  | D             |
|  5 | 2019-09-02 |  95 | AC  | 1001 | 2B     | 0000000000 | FK  | D             |
|  6 | 2019-09-03 |  10 | AC  | 1001 | 3C     | <NULL>     | SL  | E             |
|  7 | 2019-09-04 |   8 | AC  | 1001 | 3C     | 0000000000 | FH  | E             |
|  8 | 2019-09-05 |  40 | DC  | 1001 | 3C     | 0000000600 | WO  | E             |
|  9 | 2019-09-06 |  65 | DC  | 1001 | 4D     | <NULL>     | FK  | F             |
| 10 | 2019-09-07 |  95 | AC  | 1001 | 4D     | 0000000000 | SL  | F             |
| 11 | 2019-09-08 |  10 | AC  | 1001 | 4D     | 0000000600 | FH  | F             |
| 12 | 2019-09-09 |   8 | AC  | 1001 | <NULL> | 0000000300 | WO  | G             |
| 13 | 2019-09-10 |  40 | DC  | 1001 | <NULL> | 0000000500 | FK  | H             |
| 14 | 2019-09-11 |  65 | DC  | 1001 | <NULL> | <NULL>     | SL  | I             |
| 15 | 2019-09-12 |  95 | AC  | 1001 | <NULL> | 0000000000 | FH  | I             |
+----+------------+-----+-----+------+--------+------------+-----+---------------+
TABLE B:
+------+--------+--------+------+
| UCDE |  STM   |  SRL   | DESC |
+------+--------+--------+------+
| 1001 | 1A     | 000300 | A    |
| 1001 | 2B     | 000300 | B    |
| 1001 | 2B     | 000600 | C    |
| 1001 | 2B     | <NULL> | D    |
| 1001 | 3C     | <NULL> | E    |
| 1001 | 4D     | <NULL> | F    |
| 1001 | <NULL> | 000300 | G    |
| 1001 | <NULL> | 000500 | H    |
| 1001 | <NULL> | <NULL> | I    |
+------+--------+--------+------+

据我了解,您尝试加入 TABLEONETABLETWO 的条件如下:

  1. a.TS = b.UCDE

  2. a.ICI IS NOT NULL AND a.ICI = b.STM 和逻辑#3;否则加入 b.STM IS NULL

  3. RIGHT(a.S_ID,6) = b.SRL;否则,如果 a.S_ID = 0000000000 or IS NULL,加入 b.SRL IS NULL

我采用了您指定的逻辑并重新处理了您的 JOIN 条件:

SELECT a.TDATE, a.AMT, a.ISI, a.TS, b.UCDE, a.ICI, b.STM, a.S_ID,
CASE WHEN b.SRL1 IS NULL THEN '0000000000' ELSE b.SRL1 END AS SRL, a.RES,
-- Manually override certain conditions
CASE WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
     WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
     ELSE b.DESC END AS DESC
FROM TABLEONE AS a
FULL OUTER JOIN TABLETWO AS b ON a.TS = b.UCDE
  AND (
    (
      a.ICI = b.STM AND -- a.ICI IS NOT NULL --> check for match on b.STM (NULL <> NULL)
      (
        (RIGHT(a.S_ID, 6) = b.SRL) OR -- Priority should be on RIGHT(a.S_ID,6) = b.SRL
        (COALESCE(a.S_ID, 0000000000) = 0000000000 AND b.SRL IS NULL)
      )
    ) OR
    (a.ICI IS NULL AND b.STM IS NULL) -- a.ICI IS NULL then join on b.STM IS NULL
  )
ORDER BY (CASE WHEN STM IS NULL THEN 2 ELSE 1 END)

我没有要测试的 TD 系统,但请尝试一下,让我知道结果如何。

此外,我不太遵循这条线 If it does not find a match (i.e. b.STM IS NULL), then return DESC for that row. 我认为这是您在 SELECT 中的 CASE 表达式中的逻辑,所以我没有触及那部分。

更新
根据我的回答中的查询,这是一个事实 table:

Case    a.ICI   b.STM   "Match" Join rows?
------------------------------------------
1       NULL    NULL    NO      YES
2       5E      NULL    NO      NO
3       NULL    6B      NO      NO
4       5E      6B      NO      NO
5       5E      5E      YES     Logic #3 (a.S_ID = b.SRL?)

这是您正在寻找的逻辑吗?如果您获得了预期的行,那么您只需在 SELECT 中添加 COALESCECASE 语句即可显示正确的 DESC 值。

如果上面 table 中的逻辑不是您想要的,并且您没有得到预期的行,那么根据您将 b.STM IS NULL 作为包罗万象的评论,我认为您可以将这一行 (a.ICI IS NULL AND b.STM IS NULL) 修改为简单的 (b.STM IS NULL)。这将导致连接适用于案例 #2。

我怀疑您是否可以使用没有多行匹配的单个连接获得预期的结果,必须使用像 ROW_NUMBER 这样的 OLAP 函数删除它(在大型 table 上可能会非常昂贵) .

尝试第二次连接到 tabletwo(当然,然后您需要 COALESCE 用于 table 的每一列):

SELECT a.TDATE, a.AMT, a.ISI, a.TS,
   Coalesce(b1.UCDE, b2.UCDE), a.ICI,
   Coalesce(b1.STM,  b2.STM), a.S_ID,
   Coalesce(b1.SRL,  b2.SRL, '0000000000') AS SRL,
   a.RES,
   -- Manually override certain conditions
   CASE
      WHEN (a.ISI = 'AP' AND a.TS = '1234') THEN 'APP'
      WHEN (a.ISI = 'JK' AND a.TS = '1234') THEN 'JAX'
      ELSE Coalesce(b1.DESC, b2.DESC)
   END AS DESC_
FROM tableone AS a
LEFT JOIN tabletwo AS b1
  ON a.TS = b1.UCDE
 AND (a.ICI = b1.STM OR (a.ICI IS NULL AND b1.STM IS NULL))
 AND (Coalesce(Right(a.S_ID,6), '000000') = Coalesce(b1.SRL, '000000'))
LEFT JOIN tabletwo AS b2
  ON a.TS = b2.UCDE
 AND (a.ICI = b2.STM OR (a.ICI IS NULL AND b2.STM IS NULL))
 AND b1.UCDE IS NULL -- join only if no match based on SRL, yet

性能将主要取决于 tableone 的 PI 和 tabletwo 的行数。