无法使用多个变量解析 Rank Over Partition
Unable to resolve Rank Over Partition with multiple variables
我正在尝试分析一堆交易数据,并设置了一系列不同的排名来帮助我。我弄不对的是受益人等级。我希望它按时间顺序而不是字母顺序对受益人发生变化的地方进行分区。
如果同一受益人从 1 月到 3 月付款,然后在 6 月再次付款,我希望 6 月被归类为单独的 'session'。
我正在使用 Teradata SQL 如果这有影响的话。
我认为解决方案将是 DENSE_RANK
但如果我 PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate
它会计算月数。如果我 PARTITION BY (CustomerID) ORDER BY Beneficiary
那么它不是按时间顺序排列的,我需要最高排名是最新的 Beneficiary
.
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID Beneficiary Amount DateStamp Month PaymentRank MonthRank BeneficiaryRank TransactionRank
a aa 10 Jan 1 1 1 1
a aa 20 Feb 2 2 2 1
a aa 20 Mar 3 3 3 2
a aa 20 Apr 4 4 4 3
a bb 20 May 5 5 1 1
a bb 30 Jun 6 6 2 1
a aa 30 Jul 7 7 5 2
a aa 30 Aug 8 8 6 1
a cc 5 Sep 9 9 1 1
a cc 5 Oct 10 10 2 2
a cc 5 Nov 11 11 3 3
b cc 5 Dec 1 1 1 1
这是我目前所拥有的,我想要一个列,如下所示
CustomerID Beneficiary Amount DateStamp Month NewRank
a aa 10 Jan 1
a aa 20 Feb 1
a aa 20 Mar 1
a aa 20 Apr 1
a bb 20 May 2
a bb 30 Jun 2
a aa 30 Jul 3
a aa 30 Aug 3
a cc 5 Sep 4
a cc 5 Oct 4
a cc 5 Nov 4
b cc 5 Dec 1
这是一种间隙和孤岛问题。我会推荐 lag()
和累计总和:
select t.*,
sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)
from (select t.*,
lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
from t
) t
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag)
OVER(PARTITION BY CustomerID
ORDER BY SystemDate ASC
,flag DESC -- needed if the order by columns are not unique
ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
FROM table
) AS dt
ORDER BY CustomerID, PaymentRank
您的 Gordon 查询问题可能是由您的 Teradata 版本引起的,LAG
仅在 16.10+ 中受支持。但有一个简单的解决方法:
LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)
--is equivalent to
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))
感谢@Gordon 和@dnoeth 提供的想法和代码让我走上正轨。
以下内容大部分来自 dnoeth,但需要在前面添加 ROWS unbounded 以获得正确的聚合。没有这个它只是显示分区的总数。我还将 systemdate 更改为 paymentrank,因为我不得不 fiddle 大约一天重复条目。
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0 ELSE 1 END AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank
只要受益人发生变化,内部查询就会设置一个标志。然后外部查询对这些进行累加。
我不确定 unbounded preceding 在做什么,@dnoeth 有一个很好的解释下面是从那个解释中摘录的。
•UNBOUNDED PRECEDING, all rows before the current row -> fixed
•UNBOUNDED FOLLOWING, all rows after the current row -> fixed
•x PRECEDING, x rows before the current row -> relative
•y FOLLOWING, y rows after the current row -> relative
我正在尝试分析一堆交易数据,并设置了一系列不同的排名来帮助我。我弄不对的是受益人等级。我希望它按时间顺序而不是字母顺序对受益人发生变化的地方进行分区。
如果同一受益人从 1 月到 3 月付款,然后在 6 月再次付款,我希望 6 月被归类为单独的 'session'。
我正在使用 Teradata SQL 如果这有影响的话。
我认为解决方案将是 DENSE_RANK
但如果我 PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate
它会计算月数。如果我 PARTITION BY (CustomerID) ORDER BY Beneficiary
那么它不是按时间顺序排列的,我需要最高排名是最新的 Beneficiary
.
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID Beneficiary Amount DateStamp Month PaymentRank MonthRank BeneficiaryRank TransactionRank
a aa 10 Jan 1 1 1 1
a aa 20 Feb 2 2 2 1
a aa 20 Mar 3 3 3 2
a aa 20 Apr 4 4 4 3
a bb 20 May 5 5 1 1
a bb 30 Jun 6 6 2 1
a aa 30 Jul 7 7 5 2
a aa 30 Aug 8 8 6 1
a cc 5 Sep 9 9 1 1
a cc 5 Oct 10 10 2 2
a cc 5 Nov 11 11 3 3
b cc 5 Dec 1 1 1 1
这是我目前所拥有的,我想要一个列,如下所示
CustomerID Beneficiary Amount DateStamp Month NewRank
a aa 10 Jan 1
a aa 20 Feb 1
a aa 20 Mar 1
a aa 20 Apr 1
a bb 20 May 2
a bb 30 Jun 2
a aa 30 Jul 3
a aa 30 Aug 3
a cc 5 Sep 4
a cc 5 Oct 4
a cc 5 Nov 4
b cc 5 Dec 1
这是一种间隙和孤岛问题。我会推荐 lag()
和累计总和:
select t.*,
sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)
from (select t.*,
lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
from t
) t
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag)
OVER(PARTITION BY CustomerID
ORDER BY SystemDate ASC
,flag DESC -- needed if the order by columns are not unique
ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
FROM table
) AS dt
ORDER BY CustomerID, PaymentRank
您的 Gordon 查询问题可能是由您的 Teradata 版本引起的,LAG
仅在 16.10+ 中受支持。但有一个简单的解决方法:
LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)
--is equivalent to
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))
感谢@Gordon 和@dnoeth 提供的想法和代码让我走上正轨。
以下内容大部分来自 dnoeth,但需要在前面添加 ROWS unbounded 以获得正确的聚合。没有这个它只是显示分区的总数。我还将 systemdate 更改为 paymentrank,因为我不得不 fiddle 大约一天重复条目。
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0 ELSE 1 END AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank
只要受益人发生变化,内部查询就会设置一个标志。然后外部查询对这些进行累加。
我不确定 unbounded preceding 在做什么,@dnoeth 有一个很好的解释
•UNBOUNDED PRECEDING, all rows before the current row -> fixed
•UNBOUNDED FOLLOWING, all rows after the current row -> fixed
•x PRECEDING, x rows before the current row -> relative
•y FOLLOWING, y rows after the current row -> relative