SQL 按列分组但根据另一列分段

SQL group by an column but segmented based on another column

我有这个 table,它包含大约 100000 多行和 3 列:

我需要找到一个报表,该报表按帐户对未结金额进行分组,但还要根据日期进行削减。 1 个帐户的示例数据:

+----------------+-------------+--------------------+--+
| account_number | report_date | outstanding_amount |  |
+----------------+-------------+--------------------+--+
|              1 | 02/01/2019  |                100 |  |
|              1 | 03/01/2019  |                100 |  |
|              1 | 06/01/2019  |                200 |  |
|              1 | 07/01/2019  |                300 |  |
|              1 | 10/01/2019  |                200 |  |
|              1 | 11/01/2019  |                200 |  |
|              1 | 12/01/2019  |                100 |  |
+----------------+-------------+--------------------+--+    

所以如果我 运行 这个语句:

select * from (select account_number, min(report_date) mindate, max(report_date) maxdate, outstading_amount from table1 grouped by account_number, outstanding_amount)

这条语句的结果应该是这样的:

+----------------+------------+------------+--------------------+
| account_number |  mindate   |  maxdate   | outstanding_amount |
+----------------+------------+------------+--------------------+
|              1 | 02/01/2019 | 12/01/2019 |                100 |
|              1 | 06/01/2019 | 11/01/2019 |                200 |
|              1 | 07/01/2019 | 07/01/2019 |                300 |
+----------------+------------+------------+--------------------+

所以这里我想把结果分开,这样一行的 mindate 和 maxdate 之间的天数就不会与下一行的天数重叠。我正在寻找的结果是这样的:

+----------------+------------+------------+--------------------+
| account_number |  mindate   |  maxdate   | outstanding_amount |
+----------------+------------+------------+--------------------+
|              1 | 02/01/2019 | 03/01/2019 |                100 |
|              1 | 06/01/2019 | 06/01/2019 |                200 |
|              1 | 07/01/2019 | 07/01/2019 |                300 |
|              1 | 10/01/2019 | 11/01/2019 |                200 |
|              1 | 12/01/2019 | 12/01/2019 |                100 |
+----------------+------------+------------+--------------------+

是否可以构造此语句?

要展平数据,按计算出的排名压缩它。

select account_number
, min(report_date) as mindate
, max(report_date) as maxdate
, outstanding_amount
from
(
    select q1.*
    , sum(flag) over (partition by account_number order by report_date) as rnk
    from
    (
        select t.*
        , case when outstanding_amount = lag(outstanding_amount, 1) over (partition by account_number order by report_date) then 0 else 1 end as flag
        from table1 t
    ) q1
) q2
group by account_number, outstanding_amount, rnk
order by account_number, mindate;

db<>fiddle here

的测试

这是一个缺口和孤岛问题。在这种情况下,最简单的解决方案可能是行号的差异:

select account_number, outstanding_amount,
       min(report_date), max(report_date)
from (select t.*,
             row_number() over (partition by account_number order by report_date) as seqnum,
             row_number() over (partition by account_number, outstanding_amount order by report_date) as seqnum_o
      from t
     ) t
group by account_number, outstanding_amount, (seqnum - seqnum_o)
order by account_number, min(report_date);

为什么这行得通有点难以解释。但是如果你查看子查询的结果,你将能够看到行号的差异如何定义具有相同数量的相邻行。