teradata,重置时间,分区依据,排序依据

teradata, reset when, partition by, order by

我需要帮助来理解下面的代码。在 Teradata 中使用时,我从未见过重置。 Teradata 中的 RESET WHEN 有何作用?我了解分区和按部分排序。我也不确定为什么这没有被 PARTITION BY A.ACCT_DIM_NB, A.DAY_TIME_DIM_NB ORDER BY A.TXN_POSTING_SEQ 分区。此外,ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是否仅使用整个分区 window?

Removed

I was also unsure why this wasn't partitioned by PARTITION BY Y.ACCT_DIM_NB, Y.DAY_TIME_DIM_NB ORDER BY Y.DAY_TIME_DIM_NB, Y.TXN_POSTING_SEQ

不知道,但这会 return 不同的结果(Y.DAY_TIME_DIM_NBORDER BY 中不需要,因为它已经被它分区了)

Also, is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW just using the whole partitioned window?

它与 ROWS UNBOUNDED PRECEDING 完全相同,即 Cumulative Max 的语法变体。 lpartition 是 ROWS UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

What does RESET WHEN do in Teradata?

RESET WHEN 是用于动态添加分区的 Teradata 扩展,它是两个(在您的情况下)或三个嵌套 OLAP 函数的较短语法:

-- using RESET WHEN
MAX(A.RUN_BAL_AM)
OVER (PARTITION BY A.ACCT_DIM_NB
      ORDER BY A.DAY_TIME_DIM_NB, A.TXN_POSTING_SEQ 
      RESET WHEN A.CS_TXN_CD NOT IN ('072','075','079','107','111','112','139','181','318') 
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS  EOD_BAL_AM



-- Same result using Standard SQL
SELECT  
   Max(A.RUN_BAL_AM)
   Over (PARTITION BY A.ACCT_DIM_NB, dynamic_partition
         ORDER BY A.DAY_TIME_DIM_NB, A.TXN_POSTING_SEQ 
         ROWS BETWEEN Unbounded Preceding AND CURRENT ROW) AS  EOD_BAL_AM

FROM 
 ( 
   SELECT
      -- this cumulative sum over 0/1 assigns a new value for each series of rows based on the CASE
      Sum(CASE WHEN A.CS_TXN_CD NOT IN ('072','075','079','107','111','112','139','181','318') THEN 1 ELSE 0 end)
      Over (PARTITION BY A.ACCT_DIM_NB, dynamic_partition
            ORDER BY A.DAY_TIME_DIM_NB, A.TXN_POSTING_SEQ 
            ROWS Unbounded Preceding) AS dynamic_partition
   FROM ...
 ) AS dt

What does RESET WHEN do in Teradata?

当子句为真时,重置 window 累积。网络上有很多这样的例子,但在你的情况下,我想象(从未见过它与 max 一起使用)它有效地定义了一个点,从这个点开始计算 max,并且每次遇到不在给定列表中的 txid它导致最大值仅从该点计算

I was also unsure why this wasn't partitioned by PARTITION BY Y.ACCT_DIM_NB, Y.DAY_TIME_DIM_NB ORDER BY Y.DAY_TIME_DIM_NB, Y.TXN_POSTING_SEQ .

为什么你认为应该这样做?分区和顺序有很大的不同。如果您有银行系统,您可能会按帐户分区,但如果您正在准备银行对帐单,则按日期对交易进行排序。

Also, is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW just using the whole partitioned window?

它定义了累加器应该查看的记录段以得出其答案。在您的情况下,最大值仅根据前几行计算。无界前置方式(自分区开始以来的所有行)。当前行就是这个意思。其他有效示例可能是:

ROWS BETWEEN 200 preceding and current row
ROWS BETWEEN 10 preceding and 20 following 
ROWS BETWEEN current row and unbounded following

因为您的 window 仅定义为之前的行,所以随着行顺序的增加,最大值将保持在任何给定的最大值,直到数据中出现新的最大值。例如:

Data,max
3,3
2,3
1,3
4,4
1,4
3,4
1,4
5,5
4,5
2,5
4,5
9,9
5,9

当您从上到下进行操作时,一旦在当前行上找到比已知最大值更大的最大值,它就会成为新的最大值。仅在没有前几行的限制的情况下,如果整个数据集被最大化,则报告的每行最大值为 9