SQL - 根据另一列中的最大值和另一列中的值组合选择列值 - Teradata

SQL - Selecting a column value based on max value in another column and combination of values in another column - Teradata

我输入的 Teradata table accnt_pln_info 示例数据如下。

Account_number   Plan_code   Plan_Date    Base_Amount     Biz_Date
ACCT1            R           2017-JAN-01         100      2017-MAY-31
ACCT1            R           2017-JAN-11          30      2017-MAY-31
ACCT1            K           2017-JAN-22          80      2017-MAY-31
ACCT1            B           2017-JAN-13          50      2017-MAY-31
ACCT1            C           2017-JAN-18         180      2017-MAY-31
ACCT2            R           2017-JAN-12          70      2017-MAY-31
ACCT2            C           2017-JAN-02          90      2017-MAY-31
ACCT2            R           2017-JAN-08          10      2017-MAY-31
ACCT2            D           2017-JAN-02          40      2017-MAY-31
ACCT2            B           2017-FEB-24          14      2017-MAY-31
ACCT2            K           2017-FEB-12          79      2017-MAY-31

期望输出:(对于过滤条件Biz_Date=2017-MAY-31

Account_number   RK_Plan_Date    RK_Base_Amount   RC_Plan_Date   RC_Base_Amount
ACCT1            2017-JAN-22          80          2017-JAN-18         180
ACCT2            2017-FEB-12          79          2017-JAN-12          70    

逻辑:

Filter condition applied Biz_Date=2017-MAY-31 as table has multiple distinct biz_dates.
Group by Account_Number;  Plan_Date in (R,K), 
find the max Plan_Date and then get that rows Base_Amount; 
Plan_Date in (R,C), find the max Plan_Date and 
then get that rows Base_Amount.

例如: 对于 ACCT1 和 plan_code in ('R','K'),最大 plan_date 值为 2017-JAN-22;因此需要得到那行的 Base_amount,即 80

假设:

There can be duplicates on Account_number and Plan_Code.
There will not be duplicates on Account_number, Plan_Code in (R,K) and Plan_Date.
There will not be duplicates on Account_number, Plan_Code in (R,C) and Plan_Date.
The input order in table is not necessarily the same. 

我尝试过但失败的:

SELECT ACCOUNT_NUMBER, 
MAX(CASE WHEN PLAN_DATE IN ('R','K') THEN PLAN_DATE END) MAX_RK_PLAN_DATE,
MAX(CASE WHEN PLAN_DATE IN ('R','K') AND MAX_PLAN_DATE=PLAN_DATE THEN BASE_AMOUNT END) REQUIRED_RK_AMOUNT,
MAX(CASE WHEN PLAN_DATE IN ('R','C') THEN PLAN_DATE END) MAX_RC_PLAN_DATE,
MAX(CASE WHEN PLAN_DATE IN ('R','C') AND MAX_PLAN_DATE=PLAN_DATE THEN BASE_AMOUNT END) REQUIRED_RC_AMOUNT 
FROM ACCNT_PLN_INFO;

正如预期的那样,它失败了,因为我将聚合函数嵌套到一个普通的 case 语句中。 我想通过将数据拆分为

来使用数据块
SELECT ....
(SELECT ACCOUNT_NUMBER, 'RK', 
MAX(PLAN_DATE) MAX_RK_PLAN_DATE FROM ACCNT_PLN_INFO WHERE 
PLAN_DATE IN ('R','K') 
UNION 
SELECT ACCOUNT_NUMBER, 'RC', 
MAX(PLAN_DATE) MAX_RC_PLAN_DATE FROM ACCNT_PLN_INFO WHERE 
PLAN_DATE IN ('R','C') )

并想再次从同一个 table 加入外部 select。但是由于 (R.K) 和 (R,C) 的不同可能组合,我无法做到这一点。当不涉及组合时,我知道如何实现它。

为方便起见,我只指定了具有 2 个值的 2 个组合 PLAN_DATE IN ('R','K'); PLAN_DATE IN ('R','C')。但实际上有6种组合,每种组合会有4个值。

我已经尽我所能来实现这一目标。但很遗憾,做不到。当我们需要值的多个组合和列值的最大值时,如何 select 列值。感谢您的宝贵时间。

编辑:使用限定重写。

您需要获取每个 plan_code 配对的最大计划日期。您可以在两个单独的派生表中执行此操作,使用 qualify 获取最大计划日期的数据。然后您可以使用 account_number.

将这两个结果连接在一起
select
rk.account_number,
rk_plan_date,
rk.base_amount as rk_base_amount,
rc.rc_plan_date,
rc.base_amount as rc_base_amount
from
(
select
    ACCNT_PLN_INFO.account_number,
    ACCNT_PLN_INFO.plan_date as rk_plan_date,
    base_amount
from 
    ACCNT_PLN_INFO
where
    plan_code in ('R','K')
qualify row_number() over (partition by ACCNT_PLN_INFO.account_number order by plan_date desc) = 1
) rk
inner join 
(select
    ACCNT_PLN_INFO.account_number,
    ACCNT_PLN_INFO.plan_date as rc_plan_date,
    base_amount
from 
    ACCNT_PLN_INFO
where
    plan_code in ('R','C')
qualify row_number() over (partition by ACCNT_PLN_INFO.account_number order by plan_date desc) = 1
)RC
on RK.account_number = rc.account_number

原始(非 teradata 特定语法):

select
rk.account_number,
rk_plan_date,
rk.base_amount as rk_base_amount,
rc.rc_plan_date,
rc.base_amount as rc_base_amount
from (
    select
    ACCNT_PLN_INFO.account_number,
    ACCNT_PLN_INFO.plan_date as rk_plan_date,
    base_amount
    from 
    ACCNT_PLN_INFO
    inner join (
    select
    account_number,
    max(plan_date) as plan_date
    from
    ACCNT_PLN_INFO
    where
    plan_code in ('R','K')
    group by 1) rk
        on ACCNT_PLN_INFO.account_number = rk.account_number
        and ACCNT_PLN_INFO.plan_date = rk.plan_date
        and ACCNT_PLN_INFO.plan_code in ('R','K')
) RK
inner join (    
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rc_plan_date,
base_amount
from 
ACCNT_PLN_INFO
inner join (
select
account_number,
max(plan_date) as plan_date
from
ACCNT_PLN_INFO
where
plan_code in ('R','C')
group by 1) rc
    on ACCNT_PLN_INFO.account_number = rc.account_number
    and ACCNT_PLN_INFO.plan_date = rc.plan_date
    and ACCNT_PLN_INFO.plan_code in ('C','R')
) RC
on RK.account_number = rc.account_number

您可以使用一种类似于您尝试应用肮脏技巧 piggybacking 的聚合的方法。

您将两列组合成一个字符串,应用 MAX,然后再次去除日期部分,例如对于 ACCT1PLAN_DATEBASE_AMOUNT 组合成一个字符串将导致:

'20170101        100'
'20170111         30'
'20170113         50'
'20170118        180'
'20170122         80' -- this will be returned by MAX

应用 max 后,您使用 SUBSTRING 再次提取两列:

   CAST(SUBSTR('2017-01-22         80', 1, 10) AS DATE)
   CAST(SUBSTR('2017-01-22         80', 11) AS INT)

当然,您必须创建一个仍在以正确方式排序的字符串,例如yyyymmdd 用于日期和固定宽度,包括用于数字的前导空格。

现在是一些剪切&粘贴&修改:

SELECT ACCOUNT_NUMBER,
   To_Date(Substr(RK, 1,8), 'yyyymmdd') AS MAX_RK_PLAN_DATE,
   Cast(Substring(RK From 9) AS INT) AS REQUIRED_RK_AMOUNT,
   To_Date(Substr(RC, 1,8), 'yyyymmdd') AS MAX_RC_PLAN_DATE,
   Cast(Substring(RC From 9) AS INT) AS REQUIRED_RC_AMOUNT
FROM 
 ( 
   SELECT ACCOUNT_NUMBER, 
      Max(CASE WHEN PLAN_code IN ('R','K') THEN To_Char(PLAN_DATE, 'yyyymmdd') || BASE_AMOUNT END) AS RK,
      Max(CASE WHEN PLAN_code IN ('R','C') THEN To_Char(PLAN_DATE, 'yyyymmdd') || BASE_AMOUNT END) AS RC
   FROM ACCNT_PLN_INFO
   WHERE  biz_date = DATE '2017-05-31'
   GROUP BY 1
 ) AS dt