Teradata SQL:如果满足条件则计算 运行 总数
Teradata SQL: Calculate running totals if a condition is met
我有一个包含以下列和数据的数据集:
Customer | Week_number | Amount
cust1 | 0 | 100
cust1 | 1 | 200
cust1 | 3 | 300
cust2 | 0 | 1000
cust2 | 1 | 2000
我需要计算每个客户每两周的总计。
使用 window 函数,我可以做到这一点:
SELECT
CUSTOMER, WEEK_NUMBER
, SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS 1 PRECEDING) AS FORTNIGHT_AMOUNT
FROM AMOUNT
但是即使前一周没有金额,这也会加起来。在上面的示例中,对于第 3 行的 cust1,它将第 3 周和第 1 周相加。仅当 week_number 比当前行的周数小 1 时才应添加该金额。这可能吗?感谢您的帮助。
我得到的是:
Customer | Week_number | Fortnight_Amount
cust1 | 0 | 100
cust1 | 1 | 300
cust1 | 3 | **500**
cust2 | 0 | 1000
cust2 | 1 | 3000
预期结果:
Customer | Week_number | Fortnight_Amount
cust1 | 0 | 100
cust1 | 1 | 300
cust1 | 3 | **300**
cust2 | 0 | 1000
cust2 | 1 | 3000
如果您只想忽略不是立即连续的周数,您可以先使用 lag()
,然后再执行 window sum()
:
select
customer,
week_number,
sum(
case when lag_week_number is null or week_number = lag_week_number + 1
then amount
else 0
end
) over(partition by customer order by week_number) fortnight_amount
from (
select
t.*,
lag(week_number) over(partition by customer order by week_number) lag_week_number
from mytable t
) t
实际上,当 week_numbers 中存在间隙时,您可能实际上想要 重置 sum
。对于这个,这是某种差距和岛屿分配,你会以不同的方式进行:这个想法是做一个累积 sum
来开始一个新的组,当两个连续的周数是连续的,然后求和每组:
select
customer,
week_number,
sum(amount) over(partition by customer, grp order by week_date) fortnight_amount
from (
select
t.*,
sum(
case
when lag_week_number is null or week_number = lag_week_number + 1
then 0
else 1
end
) grp
from (
select
t.*,
lag(week_number) over(partition by customer order by week_number) lag_week_number
from mytable t
) t
) t
您想要 range
分区,而不是 row
分区:
SELECT CUSTOMER, WEEK_NUMBER,
SUM(AMOUNT) OVER (PARTITION BY CUSTOMER
ORDER BY WEEK_NUMBER
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW
) AS FORTNIGHT_AMOUNT
FROM AMOUNT;
感谢@Gordon 和@GMB 的回答。不幸的是,我无法在 Teradata SQL 中同时使用 LAG 函数或 RANGE 分区。但我能够使用你们描述的概念得到以下答案。
SELECT
CUSTOMER
, WEEK_NUMBER
, LAG_WEEK_NUMBER
, AMOUNT
, CASE
WHEN WEEK_NUMBER = LAG_WEEK_NUMBER + 1
THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM (
SELECT
T.*
, MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
FROM MY_TABLE T
) T
ORDER BY CUSTOMER, WEEK_NUMBER
我能够从@dnoeth 在这些链接中的回答中获得 Teradata 中的 LAG 函数实现:
MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
rows between 1 preceding and preceding 1
Teradata partitioned query ... following rows dynamically
如果您发现答案有任何问题或可以通过任何方式改进,请告诉我。
如果只有两个 weeks/rows,您的查询可以在 Explain 中进一步简化为单个统计步骤(因为两个 OLAP 函数应用相同 PARTITION/ORDER):
SELECT T.*
, CASE
WHEN MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 1 = WEEK_NUMBER
THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM MY_TABLE T
ORDER BY CUSTOMER, WEEK_NUMBER
当然这假设周从 0 开始并且没有上一年的第 52/53 周。
我有一个包含以下列和数据的数据集:
Customer | Week_number | Amount
cust1 | 0 | 100
cust1 | 1 | 200
cust1 | 3 | 300
cust2 | 0 | 1000
cust2 | 1 | 2000
我需要计算每个客户每两周的总计。
使用 window 函数,我可以做到这一点:
SELECT
CUSTOMER, WEEK_NUMBER
, SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS 1 PRECEDING) AS FORTNIGHT_AMOUNT
FROM AMOUNT
但是即使前一周没有金额,这也会加起来。在上面的示例中,对于第 3 行的 cust1,它将第 3 周和第 1 周相加。仅当 week_number 比当前行的周数小 1 时才应添加该金额。这可能吗?感谢您的帮助。
我得到的是:
Customer | Week_number | Fortnight_Amount
cust1 | 0 | 100
cust1 | 1 | 300
cust1 | 3 | **500**
cust2 | 0 | 1000
cust2 | 1 | 3000
预期结果:
Customer | Week_number | Fortnight_Amount
cust1 | 0 | 100
cust1 | 1 | 300
cust1 | 3 | **300**
cust2 | 0 | 1000
cust2 | 1 | 3000
如果您只想忽略不是立即连续的周数,您可以先使用 lag()
,然后再执行 window sum()
:
select
customer,
week_number,
sum(
case when lag_week_number is null or week_number = lag_week_number + 1
then amount
else 0
end
) over(partition by customer order by week_number) fortnight_amount
from (
select
t.*,
lag(week_number) over(partition by customer order by week_number) lag_week_number
from mytable t
) t
实际上,当 week_numbers 中存在间隙时,您可能实际上想要 重置 sum
。对于这个,这是某种差距和岛屿分配,你会以不同的方式进行:这个想法是做一个累积 sum
来开始一个新的组,当两个连续的周数是连续的,然后求和每组:
select
customer,
week_number,
sum(amount) over(partition by customer, grp order by week_date) fortnight_amount
from (
select
t.*,
sum(
case
when lag_week_number is null or week_number = lag_week_number + 1
then 0
else 1
end
) grp
from (
select
t.*,
lag(week_number) over(partition by customer order by week_number) lag_week_number
from mytable t
) t
) t
您想要 range
分区,而不是 row
分区:
SELECT CUSTOMER, WEEK_NUMBER,
SUM(AMOUNT) OVER (PARTITION BY CUSTOMER
ORDER BY WEEK_NUMBER
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW
) AS FORTNIGHT_AMOUNT
FROM AMOUNT;
感谢@Gordon 和@GMB 的回答。不幸的是,我无法在 Teradata SQL 中同时使用 LAG 函数或 RANGE 分区。但我能够使用你们描述的概念得到以下答案。
SELECT
CUSTOMER
, WEEK_NUMBER
, LAG_WEEK_NUMBER
, AMOUNT
, CASE
WHEN WEEK_NUMBER = LAG_WEEK_NUMBER + 1
THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM (
SELECT
T.*
, MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
FROM MY_TABLE T
) T
ORDER BY CUSTOMER, WEEK_NUMBER
我能够从@dnoeth 在这些链接中的回答中获得 Teradata 中的 LAG 函数实现:
MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS LAG_WEEK_NUMBER
rows between 1 preceding and preceding 1
Teradata partitioned query ... following rows dynamically
如果您发现答案有任何问题或可以通过任何方式改进,请告诉我。
如果只有两个 weeks/rows,您的查询可以在 Explain 中进一步简化为单个统计步骤(因为两个 OLAP 函数应用相同 PARTITION/ORDER):
SELECT T.*
, CASE
WHEN MAX(WEEK_NUMBER) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) + 1 = WEEK_NUMBER
THEN SUM(AMOUNT) OVER (PARTITION BY CUSTOMER ORDER BY WEEK_NUMBER ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ELSE AMOUNT
END AS TWO_WEEK_SUM_AMOUNT
FROM MY_TABLE T
ORDER BY CUSTOMER, WEEK_NUMBER
当然这假设周从 0 开始并且没有上一年的第 52/53 周。