Teradata SQL:如果满足条件则计算 运行 总数

Teradata SQL: Calculate running totals if a condition is met

我有一个包含以下列和数据的数据集:

Customer | Week_number | Amount
cust1    |  0          | 100
cust1    |  1          | 200
cust1    |  3          | 300
cust2    |  0          | 1000
cust2    |  1          | 2000

我需要计算每个客户每两周的总计。

使用 window 函数,我可以做到这一点:

SELECT 
 CUSTOMER, WEEK_NUMBER
, SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS 1 PRECEDING) AS FORTNIGHT_AMOUNT
FROM AMOUNT

但是即使前一周没有金额,这也会加起来。在上面的示例中,对于第 3 行的 cust1,它将第 3 周和第 1 周相加。仅当 week_number 比当前行的周数小 1 时才应添加该金额。这可能吗?感谢您的帮助。

我得到的是:

Customer | Week_number | Fortnight_Amount
cust1    |  0          | 100
cust1    |  1          | 300
cust1    |  3          | **500**
cust2    |  0          | 1000
cust2    |  1          | 3000

预期结果:

Customer | Week_number | Fortnight_Amount
cust1    |  0          | 100
cust1    |  1          | 300
cust1    |  3          | **300**
cust2    |  0          | 1000
cust2    |  1          | 3000

如果您只想忽略不是立即连续的周数,您可以先使用 lag(),然后再执行 window sum():

select
    customer,
    week_number,
    sum(
        case when lag_week_number is null or week_number = lag_week_number + 1 
            then amount
            else 0 
        end
    ) over(partition by customer order by week_number) fortnight_amount
from (
    select 
        t.*, 
        lag(week_number) over(partition by customer order by week_number) lag_week_number
    from mytable t
) t

实际上,当 week_numbers 中存在间隙时,您可能实际上想要 重置 sum。对于这个,这是某种差距和岛屿分配,你会以不同的方式进行:这个想法是做一个累积 sum 来开始一个新的组,当两个连续的周数是连续的,然后求和每组:

select 
    customer,
    week_number,
    sum(amount) over(partition by customer, grp order by week_date) fortnight_amount
from (
    select 
        t.*,
        sum(
            case 
                when lag_week_number is null or week_number = lag_week_number + 1 
                then 0
                else 1
            end
        ) grp
    from (
        select 
            t.*, 
            lag(week_number) over(partition by customer order by week_number) lag_week_number
        from mytable t
    ) t
) t

您想要 range 分区,而不是 row 分区:

SELECT CUSTOMER, WEEK_NUMBER,
       SUM(AMOUNT) OVER (PARTITION BY CUSTOMER
                         ORDER BY WEEK_NUMBER 
                         RANGE BETWEEN 1 PRECEDING AND CURRENT ROW
                        ) AS FORTNIGHT_AMOUNT
FROM AMOUNT;

感谢@Gordon 和@GMB 的回答。不幸的是,我无法在 Teradata SQL 中同时使用 LAG 函数或 RANGE 分区。但我能够使用你们描述的概念得到以下答案。

SELECT 
CUSTOMER
, WEEK_NUMBER
, LAG_WEEK_NUMBER
, AMOUNT
, CASE 
  WHEN WEEK_NUMBER = LAG_WEEK_NUMBER + 1 
  THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
  ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM (
  SELECT 
  T.*
  , MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
  FROM MY_TABLE T
  ) T
ORDER BY CUSTOMER, WEEK_NUMBER

我能够从@dnoeth 在这些链接中的回答中获得 Teradata 中的 LAG 函数实现:

MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER

rows between 1 preceding and preceding 1

Teradata partitioned query ... following rows dynamically

如果您发现答案有任何问题或可以通过任何方式改进,请告诉我。

如果只有两个 weeks/rows,您的查询可以在 Explain 中进一步简化为单个统计步骤(因为两个 OLAP 函数应用相同 PARTITION/ORDER):

SELECT T.*
, CASE 
    WHEN MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 1 = WEEK_NUMBER
    THEN SUM(AMOUNT)      OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
   ELSE AMOUNT
  END AS TWO_WEEK_SUM_AMOUNT
FROM MY_TABLE T
ORDER BY CUSTOMER, WEEK_NUMBER

当然这假设周从 0 开始并且没有上一年的第 52/53 周。