用 GROUP BY 求和
SUM OVER with GROUP BY
我正在处理一个包含数百万行的大型数据库,我正在努力提高查询效率。该数据库包含贷款组合的常规快照,其中有时会出现贷款违约(状态从“1”变为 <>“1”)。当它们出现时,它们只在相应的快照中出现一次,然后就不再被报告。我正在尝试对此类贷款进行累计计数——随着时间的推移,它们会根据来源国、年份等分成许多桶。
SUM (...) OVER 似乎是一个非常有效的函数来实现结果但是当我 运行 下面的查询
Select
assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status,
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType,
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal,
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(aa27) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(aa26) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22>='2014-01' and aa22<='2014-12' and vintage='2015' and active=0 and aa74<>'1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate
我明白了
SQL Error (8120) column aa27 is invalid in the selected list because it is not contained in either an aggregate function or the GROUP BY clause
任何人都可以解释一下吗?谢谢
我相信你想要:
Select assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status,
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType,
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal,
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(SUM(aa27)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(SUM(aa26)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22 >= '2014-01' and aa22 <= '2014-12' and vintage = '2015' and
active = 0 and aa74 <> '1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate;
注意累加和表达式中的SUM(SUM())
。
这是我发现的工作方式,将我的结果与一些外部研究数据进行比较。
为了便于阅读,我简化了字段:
select
poolcutoffdate,
count(1) as LoanCount,
MAX(sum(case status when 'default' then 1 else 0 end))
over (order by poolcutoffdate
ROWS between unbounded preceding AND CURRENT ROW) as CumulDefaults
from myDatabase
group by poolcutoffdate
order by poolcutoffdate asc
因此,我计算了从开始到当前截止日期至少一次处于 'default' 状态的所有贷款。
注意 MAX(SUM()) 的使用,以便结果是从第一行到当前行的各种迭代中的最大值。使用 SUM(SUM()) 会添加各种迭代,从而导致累积量的累积。
我考虑过将 SUM(SUM()) 与 "PARTITION BY poolcutoffdate" 一起使用,以便计数从 0 重新开始并且不会从上一个截止日期开始添加,但这只会包括最近截止日期的贷款,因此如果贷款如果违约并从池中移除,它将错误地不被计算在内。
注意 OVER 语句中的 CASE。
感谢大家的帮助
我正在处理一个包含数百万行的大型数据库,我正在努力提高查询效率。该数据库包含贷款组合的常规快照,其中有时会出现贷款违约(状态从“1”变为 <>“1”)。当它们出现时,它们只在相应的快照中出现一次,然后就不再被报告。我正在尝试对此类贷款进行累计计数——随着时间的推移,它们会根据来源国、年份等分成许多桶。 SUM (...) OVER 似乎是一个非常有效的函数来实现结果但是当我 运行 下面的查询
Select
assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status,
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType,
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal,
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(aa27) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(aa26) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22>='2014-01' and aa22<='2014-12' and vintage='2015' and active=0 and aa74<>'1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate
我明白了
SQL Error (8120) column aa27 is invalid in the selected list because it is not contained in either an aggregate function or the GROUP BY clause
任何人都可以解释一下吗?谢谢
我相信你想要:
Select assetcountry, edcode, vintage, aa25 as inclusionYrMo, poolcutoffdate, aa74 as status,
AA16 AS employment, AA36 AS product, AA48 AS newUsed, aa55 as customerType,
count(1) as Loans, sum(aa26) as OrigBal, sum(aa27) as CurBal,
SUM(count(1)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as LoanCountCumul,
SUM(SUM(aa27)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as CurBalCumul,
SUM(SUM(aa26)) OVER (ORDER BY [poolcutoffdate] ROWS UNBOUNDED PRECEDING) as OrigBalCumul
from myDatabase
where aa22 >= '2014-01' and aa22 <= '2014-12' and vintage = '2015' and
active = 0 and aa74 <> '1'
group by assetcountry, edcode, vintage, aa25, aa74, aa16, aa36, aa48, aa55, poolcutoffdate
order by poolcutoffdate;
注意累加和表达式中的SUM(SUM())
。
这是我发现的工作方式,将我的结果与一些外部研究数据进行比较。 为了便于阅读,我简化了字段:
select
poolcutoffdate,
count(1) as LoanCount,
MAX(sum(case status when 'default' then 1 else 0 end))
over (order by poolcutoffdate
ROWS between unbounded preceding AND CURRENT ROW) as CumulDefaults
from myDatabase
group by poolcutoffdate
order by poolcutoffdate asc
因此,我计算了从开始到当前截止日期至少一次处于 'default' 状态的所有贷款。
注意 MAX(SUM()) 的使用,以便结果是从第一行到当前行的各种迭代中的最大值。使用 SUM(SUM()) 会添加各种迭代,从而导致累积量的累积。
我考虑过将 SUM(SUM()) 与 "PARTITION BY poolcutoffdate" 一起使用,以便计数从 0 重新开始并且不会从上一个截止日期开始添加,但这只会包括最近截止日期的贷款,因此如果贷款如果违约并从池中移除,它将错误地不被计算在内。
注意 OVER 语句中的 CASE。
感谢大家的帮助