SQL 按列分组但根据另一列分段
SQL group by an column but segmented based on another column
我有这个 table,它包含大约 100000 多行和 3 列:
- Account_number
- Report_date
- Outstanding_amount
我需要找到一个报表,该报表按帐户对未结金额进行分组,但还要根据日期进行削减。 1 个帐户的示例数据:
+----------------+-------------+--------------------+--+
| account_number | report_date | outstanding_amount | |
+----------------+-------------+--------------------+--+
| 1 | 02/01/2019 | 100 | |
| 1 | 03/01/2019 | 100 | |
| 1 | 06/01/2019 | 200 | |
| 1 | 07/01/2019 | 300 | |
| 1 | 10/01/2019 | 200 | |
| 1 | 11/01/2019 | 200 | |
| 1 | 12/01/2019 | 100 | |
+----------------+-------------+--------------------+--+
所以如果我 运行 这个语句:
select * from (select account_number, min(report_date) mindate, max(report_date) maxdate, outstading_amount from table1 grouped by account_number, outstanding_amount)
这条语句的结果应该是这样的:
+----------------+------------+------------+--------------------+
| account_number | mindate | maxdate | outstanding_amount |
+----------------+------------+------------+--------------------+
| 1 | 02/01/2019 | 12/01/2019 | 100 |
| 1 | 06/01/2019 | 11/01/2019 | 200 |
| 1 | 07/01/2019 | 07/01/2019 | 300 |
+----------------+------------+------------+--------------------+
所以这里我想把结果分开,这样一行的 mindate 和 maxdate 之间的天数就不会与下一行的天数重叠。我正在寻找的结果是这样的:
+----------------+------------+------------+--------------------+
| account_number | mindate | maxdate | outstanding_amount |
+----------------+------------+------------+--------------------+
| 1 | 02/01/2019 | 03/01/2019 | 100 |
| 1 | 06/01/2019 | 06/01/2019 | 200 |
| 1 | 07/01/2019 | 07/01/2019 | 300 |
| 1 | 10/01/2019 | 11/01/2019 | 200 |
| 1 | 12/01/2019 | 12/01/2019 | 100 |
+----------------+------------+------------+--------------------+
是否可以构造此语句?
要展平数据,按计算出的排名压缩它。
select account_number
, min(report_date) as mindate
, max(report_date) as maxdate
, outstanding_amount
from
(
select q1.*
, sum(flag) over (partition by account_number order by report_date) as rnk
from
(
select t.*
, case when outstanding_amount = lag(outstanding_amount, 1) over (partition by account_number order by report_date) then 0 else 1 end as flag
from table1 t
) q1
) q2
group by account_number, outstanding_amount, rnk
order by account_number, mindate;
对 db<>fiddle here
的测试
这是一个缺口和孤岛问题。在这种情况下,最简单的解决方案可能是行号的差异:
select account_number, outstanding_amount,
min(report_date), max(report_date)
from (select t.*,
row_number() over (partition by account_number order by report_date) as seqnum,
row_number() over (partition by account_number, outstanding_amount order by report_date) as seqnum_o
from t
) t
group by account_number, outstanding_amount, (seqnum - seqnum_o)
order by account_number, min(report_date);
为什么这行得通有点难以解释。但是如果你查看子查询的结果,你将能够看到行号的差异如何定义具有相同数量的相邻行。
我有这个 table,它包含大约 100000 多行和 3 列:
- Account_number
- Report_date
- Outstanding_amount
我需要找到一个报表,该报表按帐户对未结金额进行分组,但还要根据日期进行削减。 1 个帐户的示例数据:
+----------------+-------------+--------------------+--+
| account_number | report_date | outstanding_amount | |
+----------------+-------------+--------------------+--+
| 1 | 02/01/2019 | 100 | |
| 1 | 03/01/2019 | 100 | |
| 1 | 06/01/2019 | 200 | |
| 1 | 07/01/2019 | 300 | |
| 1 | 10/01/2019 | 200 | |
| 1 | 11/01/2019 | 200 | |
| 1 | 12/01/2019 | 100 | |
+----------------+-------------+--------------------+--+
所以如果我 运行 这个语句:
select * from (select account_number, min(report_date) mindate, max(report_date) maxdate, outstading_amount from table1 grouped by account_number, outstanding_amount)
这条语句的结果应该是这样的:
+----------------+------------+------------+--------------------+
| account_number | mindate | maxdate | outstanding_amount |
+----------------+------------+------------+--------------------+
| 1 | 02/01/2019 | 12/01/2019 | 100 |
| 1 | 06/01/2019 | 11/01/2019 | 200 |
| 1 | 07/01/2019 | 07/01/2019 | 300 |
+----------------+------------+------------+--------------------+
所以这里我想把结果分开,这样一行的 mindate 和 maxdate 之间的天数就不会与下一行的天数重叠。我正在寻找的结果是这样的:
+----------------+------------+------------+--------------------+
| account_number | mindate | maxdate | outstanding_amount |
+----------------+------------+------------+--------------------+
| 1 | 02/01/2019 | 03/01/2019 | 100 |
| 1 | 06/01/2019 | 06/01/2019 | 200 |
| 1 | 07/01/2019 | 07/01/2019 | 300 |
| 1 | 10/01/2019 | 11/01/2019 | 200 |
| 1 | 12/01/2019 | 12/01/2019 | 100 |
+----------------+------------+------------+--------------------+
是否可以构造此语句?
要展平数据,按计算出的排名压缩它。
select account_number
, min(report_date) as mindate
, max(report_date) as maxdate
, outstanding_amount
from
(
select q1.*
, sum(flag) over (partition by account_number order by report_date) as rnk
from
(
select t.*
, case when outstanding_amount = lag(outstanding_amount, 1) over (partition by account_number order by report_date) then 0 else 1 end as flag
from table1 t
) q1
) q2
group by account_number, outstanding_amount, rnk
order by account_number, mindate;
对 db<>fiddle here
的测试这是一个缺口和孤岛问题。在这种情况下,最简单的解决方案可能是行号的差异:
select account_number, outstanding_amount,
min(report_date), max(report_date)
from (select t.*,
row_number() over (partition by account_number order by report_date) as seqnum,
row_number() over (partition by account_number, outstanding_amount order by report_date) as seqnum_o
from t
) t
group by account_number, outstanding_amount, (seqnum - seqnum_o)
order by account_number, min(report_date);
为什么这行得通有点难以解释。但是如果你查看子查询的结果,你将能够看到行号的差异如何定义具有相同数量的相邻行。