SQL - 根据另一列中的最大值和另一列中的值组合选择列值 - Teradata
SQL - Selecting a column value based on max value in another column and combination of values in another column - Teradata
我输入的 Teradata table accnt_pln_info 示例数据如下。
Account_number Plan_code Plan_Date Base_Amount Biz_Date
ACCT1 R 2017-JAN-01 100 2017-MAY-31
ACCT1 R 2017-JAN-11 30 2017-MAY-31
ACCT1 K 2017-JAN-22 80 2017-MAY-31
ACCT1 B 2017-JAN-13 50 2017-MAY-31
ACCT1 C 2017-JAN-18 180 2017-MAY-31
ACCT2 R 2017-JAN-12 70 2017-MAY-31
ACCT2 C 2017-JAN-02 90 2017-MAY-31
ACCT2 R 2017-JAN-08 10 2017-MAY-31
ACCT2 D 2017-JAN-02 40 2017-MAY-31
ACCT2 B 2017-FEB-24 14 2017-MAY-31
ACCT2 K 2017-FEB-12 79 2017-MAY-31
期望输出:(对于过滤条件Biz_Date=2017-MAY-31
Account_number RK_Plan_Date RK_Base_Amount RC_Plan_Date RC_Base_Amount
ACCT1 2017-JAN-22 80 2017-JAN-18 180
ACCT2 2017-FEB-12 79 2017-JAN-12 70
逻辑:
Filter condition applied Biz_Date=2017-MAY-31 as table has multiple distinct biz_dates.
Group by Account_Number; Plan_Date in (R,K),
find the max Plan_Date and then get that rows Base_Amount;
Plan_Date in (R,C), find the max Plan_Date and
then get that rows Base_Amount.
例如:
对于 ACCT1 和 plan_code in ('R','K'),最大 plan_date 值为 2017-JAN-22;因此需要得到那行的 Base_amount,即 80
假设:
There can be duplicates on Account_number and Plan_Code.
There will not be duplicates on Account_number, Plan_Code in (R,K) and Plan_Date.
There will not be duplicates on Account_number, Plan_Code in (R,C) and Plan_Date.
The input order in table is not necessarily the same.
我尝试过但失败的:
SELECT ACCOUNT_NUMBER,
MAX(CASE WHEN PLAN_DATE IN ('R','K') THEN PLAN_DATE END) MAX_RK_PLAN_DATE,
MAX(CASE WHEN PLAN_DATE IN ('R','K') AND MAX_PLAN_DATE=PLAN_DATE THEN BASE_AMOUNT END) REQUIRED_RK_AMOUNT,
MAX(CASE WHEN PLAN_DATE IN ('R','C') THEN PLAN_DATE END) MAX_RC_PLAN_DATE,
MAX(CASE WHEN PLAN_DATE IN ('R','C') AND MAX_PLAN_DATE=PLAN_DATE THEN BASE_AMOUNT END) REQUIRED_RC_AMOUNT
FROM ACCNT_PLN_INFO;
正如预期的那样,它失败了,因为我将聚合函数嵌套到一个普通的 case 语句中。
我想通过将数据拆分为
来使用数据块
SELECT ....
(SELECT ACCOUNT_NUMBER, 'RK',
MAX(PLAN_DATE) MAX_RK_PLAN_DATE FROM ACCNT_PLN_INFO WHERE
PLAN_DATE IN ('R','K')
UNION
SELECT ACCOUNT_NUMBER, 'RC',
MAX(PLAN_DATE) MAX_RC_PLAN_DATE FROM ACCNT_PLN_INFO WHERE
PLAN_DATE IN ('R','C') )
并想再次从同一个 table 加入外部 select。但是由于 (R.K) 和 (R,C) 的不同可能组合,我无法做到这一点。当不涉及组合时,我知道如何实现它。
为方便起见,我只指定了具有 2 个值的 2 个组合 PLAN_DATE IN ('R','K'); PLAN_DATE IN ('R','C')。但实际上有6种组合,每种组合会有4个值。
我已经尽我所能来实现这一目标。但很遗憾,做不到。当我们需要值的多个组合和列值的最大值时,如何 select 列值。感谢您的宝贵时间。
编辑:使用限定重写。
您需要获取每个 plan_code 配对的最大计划日期。您可以在两个单独的派生表中执行此操作,使用 qualify
获取最大计划日期的数据。然后您可以使用 account_number.
将这两个结果连接在一起
select
rk.account_number,
rk_plan_date,
rk.base_amount as rk_base_amount,
rc.rc_plan_date,
rc.base_amount as rc_base_amount
from
(
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rk_plan_date,
base_amount
from
ACCNT_PLN_INFO
where
plan_code in ('R','K')
qualify row_number() over (partition by ACCNT_PLN_INFO.account_number order by plan_date desc) = 1
) rk
inner join
(select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rc_plan_date,
base_amount
from
ACCNT_PLN_INFO
where
plan_code in ('R','C')
qualify row_number() over (partition by ACCNT_PLN_INFO.account_number order by plan_date desc) = 1
)RC
on RK.account_number = rc.account_number
原始(非 teradata 特定语法):
select
rk.account_number,
rk_plan_date,
rk.base_amount as rk_base_amount,
rc.rc_plan_date,
rc.base_amount as rc_base_amount
from (
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rk_plan_date,
base_amount
from
ACCNT_PLN_INFO
inner join (
select
account_number,
max(plan_date) as plan_date
from
ACCNT_PLN_INFO
where
plan_code in ('R','K')
group by 1) rk
on ACCNT_PLN_INFO.account_number = rk.account_number
and ACCNT_PLN_INFO.plan_date = rk.plan_date
and ACCNT_PLN_INFO.plan_code in ('R','K')
) RK
inner join (
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rc_plan_date,
base_amount
from
ACCNT_PLN_INFO
inner join (
select
account_number,
max(plan_date) as plan_date
from
ACCNT_PLN_INFO
where
plan_code in ('R','C')
group by 1) rc
on ACCNT_PLN_INFO.account_number = rc.account_number
and ACCNT_PLN_INFO.plan_date = rc.plan_date
and ACCNT_PLN_INFO.plan_code in ('C','R')
) RC
on RK.account_number = rc.account_number
您可以使用一种类似于您尝试应用肮脏技巧 piggybacking 的聚合的方法。
您将两列组合成一个字符串,应用 MAX,然后再次去除日期部分,例如对于 ACCT1
将 PLAN_DATE
和 BASE_AMOUNT
组合成一个字符串将导致:
'20170101 100'
'20170111 30'
'20170113 50'
'20170118 180'
'20170122 80' -- this will be returned by MAX
应用 max 后,您使用 SUBSTRING 再次提取两列:
CAST(SUBSTR('2017-01-22 80', 1, 10) AS DATE)
CAST(SUBSTR('2017-01-22 80', 11) AS INT)
当然,您必须创建一个仍在以正确方式排序的字符串,例如yyyymmdd
用于日期和固定宽度,包括用于数字的前导空格。
现在是一些剪切&粘贴&修改:
SELECT ACCOUNT_NUMBER,
To_Date(Substr(RK, 1,8), 'yyyymmdd') AS MAX_RK_PLAN_DATE,
Cast(Substring(RK From 9) AS INT) AS REQUIRED_RK_AMOUNT,
To_Date(Substr(RC, 1,8), 'yyyymmdd') AS MAX_RC_PLAN_DATE,
Cast(Substring(RC From 9) AS INT) AS REQUIRED_RC_AMOUNT
FROM
(
SELECT ACCOUNT_NUMBER,
Max(CASE WHEN PLAN_code IN ('R','K') THEN To_Char(PLAN_DATE, 'yyyymmdd') || BASE_AMOUNT END) AS RK,
Max(CASE WHEN PLAN_code IN ('R','C') THEN To_Char(PLAN_DATE, 'yyyymmdd') || BASE_AMOUNT END) AS RC
FROM ACCNT_PLN_INFO
WHERE biz_date = DATE '2017-05-31'
GROUP BY 1
) AS dt
我输入的 Teradata table accnt_pln_info 示例数据如下。
Account_number Plan_code Plan_Date Base_Amount Biz_Date
ACCT1 R 2017-JAN-01 100 2017-MAY-31
ACCT1 R 2017-JAN-11 30 2017-MAY-31
ACCT1 K 2017-JAN-22 80 2017-MAY-31
ACCT1 B 2017-JAN-13 50 2017-MAY-31
ACCT1 C 2017-JAN-18 180 2017-MAY-31
ACCT2 R 2017-JAN-12 70 2017-MAY-31
ACCT2 C 2017-JAN-02 90 2017-MAY-31
ACCT2 R 2017-JAN-08 10 2017-MAY-31
ACCT2 D 2017-JAN-02 40 2017-MAY-31
ACCT2 B 2017-FEB-24 14 2017-MAY-31
ACCT2 K 2017-FEB-12 79 2017-MAY-31
期望输出:(对于过滤条件Biz_Date=2017-MAY-31
Account_number RK_Plan_Date RK_Base_Amount RC_Plan_Date RC_Base_Amount
ACCT1 2017-JAN-22 80 2017-JAN-18 180
ACCT2 2017-FEB-12 79 2017-JAN-12 70
逻辑:
Filter condition applied Biz_Date=2017-MAY-31 as table has multiple distinct biz_dates.
Group by Account_Number; Plan_Date in (R,K),
find the max Plan_Date and then get that rows Base_Amount;
Plan_Date in (R,C), find the max Plan_Date and
then get that rows Base_Amount.
例如: 对于 ACCT1 和 plan_code in ('R','K'),最大 plan_date 值为 2017-JAN-22;因此需要得到那行的 Base_amount,即 80
假设:
There can be duplicates on Account_number and Plan_Code.
There will not be duplicates on Account_number, Plan_Code in (R,K) and Plan_Date.
There will not be duplicates on Account_number, Plan_Code in (R,C) and Plan_Date.
The input order in table is not necessarily the same.
我尝试过但失败的:
SELECT ACCOUNT_NUMBER,
MAX(CASE WHEN PLAN_DATE IN ('R','K') THEN PLAN_DATE END) MAX_RK_PLAN_DATE,
MAX(CASE WHEN PLAN_DATE IN ('R','K') AND MAX_PLAN_DATE=PLAN_DATE THEN BASE_AMOUNT END) REQUIRED_RK_AMOUNT,
MAX(CASE WHEN PLAN_DATE IN ('R','C') THEN PLAN_DATE END) MAX_RC_PLAN_DATE,
MAX(CASE WHEN PLAN_DATE IN ('R','C') AND MAX_PLAN_DATE=PLAN_DATE THEN BASE_AMOUNT END) REQUIRED_RC_AMOUNT
FROM ACCNT_PLN_INFO;
正如预期的那样,它失败了,因为我将聚合函数嵌套到一个普通的 case 语句中。 我想通过将数据拆分为
来使用数据块SELECT ....
(SELECT ACCOUNT_NUMBER, 'RK',
MAX(PLAN_DATE) MAX_RK_PLAN_DATE FROM ACCNT_PLN_INFO WHERE
PLAN_DATE IN ('R','K')
UNION
SELECT ACCOUNT_NUMBER, 'RC',
MAX(PLAN_DATE) MAX_RC_PLAN_DATE FROM ACCNT_PLN_INFO WHERE
PLAN_DATE IN ('R','C') )
并想再次从同一个 table 加入外部 select。但是由于 (R.K) 和 (R,C) 的不同可能组合,我无法做到这一点。当不涉及组合时,我知道如何实现它。
为方便起见,我只指定了具有 2 个值的 2 个组合 PLAN_DATE IN ('R','K'); PLAN_DATE IN ('R','C')。但实际上有6种组合,每种组合会有4个值。
我已经尽我所能来实现这一目标。但很遗憾,做不到。当我们需要值的多个组合和列值的最大值时,如何 select 列值。感谢您的宝贵时间。
编辑:使用限定重写。
您需要获取每个 plan_code 配对的最大计划日期。您可以在两个单独的派生表中执行此操作,使用 qualify
获取最大计划日期的数据。然后您可以使用 account_number.
select
rk.account_number,
rk_plan_date,
rk.base_amount as rk_base_amount,
rc.rc_plan_date,
rc.base_amount as rc_base_amount
from
(
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rk_plan_date,
base_amount
from
ACCNT_PLN_INFO
where
plan_code in ('R','K')
qualify row_number() over (partition by ACCNT_PLN_INFO.account_number order by plan_date desc) = 1
) rk
inner join
(select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rc_plan_date,
base_amount
from
ACCNT_PLN_INFO
where
plan_code in ('R','C')
qualify row_number() over (partition by ACCNT_PLN_INFO.account_number order by plan_date desc) = 1
)RC
on RK.account_number = rc.account_number
原始(非 teradata 特定语法):
select
rk.account_number,
rk_plan_date,
rk.base_amount as rk_base_amount,
rc.rc_plan_date,
rc.base_amount as rc_base_amount
from (
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rk_plan_date,
base_amount
from
ACCNT_PLN_INFO
inner join (
select
account_number,
max(plan_date) as plan_date
from
ACCNT_PLN_INFO
where
plan_code in ('R','K')
group by 1) rk
on ACCNT_PLN_INFO.account_number = rk.account_number
and ACCNT_PLN_INFO.plan_date = rk.plan_date
and ACCNT_PLN_INFO.plan_code in ('R','K')
) RK
inner join (
select
ACCNT_PLN_INFO.account_number,
ACCNT_PLN_INFO.plan_date as rc_plan_date,
base_amount
from
ACCNT_PLN_INFO
inner join (
select
account_number,
max(plan_date) as plan_date
from
ACCNT_PLN_INFO
where
plan_code in ('R','C')
group by 1) rc
on ACCNT_PLN_INFO.account_number = rc.account_number
and ACCNT_PLN_INFO.plan_date = rc.plan_date
and ACCNT_PLN_INFO.plan_code in ('C','R')
) RC
on RK.account_number = rc.account_number
您可以使用一种类似于您尝试应用肮脏技巧 piggybacking 的聚合的方法。
您将两列组合成一个字符串,应用 MAX,然后再次去除日期部分,例如对于 ACCT1
将 PLAN_DATE
和 BASE_AMOUNT
组合成一个字符串将导致:
'20170101 100'
'20170111 30'
'20170113 50'
'20170118 180'
'20170122 80' -- this will be returned by MAX
应用 max 后,您使用 SUBSTRING 再次提取两列:
CAST(SUBSTR('2017-01-22 80', 1, 10) AS DATE)
CAST(SUBSTR('2017-01-22 80', 11) AS INT)
当然,您必须创建一个仍在以正确方式排序的字符串,例如yyyymmdd
用于日期和固定宽度,包括用于数字的前导空格。
现在是一些剪切&粘贴&修改:
SELECT ACCOUNT_NUMBER,
To_Date(Substr(RK, 1,8), 'yyyymmdd') AS MAX_RK_PLAN_DATE,
Cast(Substring(RK From 9) AS INT) AS REQUIRED_RK_AMOUNT,
To_Date(Substr(RC, 1,8), 'yyyymmdd') AS MAX_RC_PLAN_DATE,
Cast(Substring(RC From 9) AS INT) AS REQUIRED_RC_AMOUNT
FROM
(
SELECT ACCOUNT_NUMBER,
Max(CASE WHEN PLAN_code IN ('R','K') THEN To_Char(PLAN_DATE, 'yyyymmdd') || BASE_AMOUNT END) AS RK,
Max(CASE WHEN PLAN_code IN ('R','C') THEN To_Char(PLAN_DATE, 'yyyymmdd') || BASE_AMOUNT END) AS RC
FROM ACCNT_PLN_INFO
WHERE biz_date = DATE '2017-05-31'
GROUP BY 1
) AS dt