从具有两级 GROUP BY 的 table 收集数据

Collect data from a table with two level of GROUP BY

我正在尝试使用 GROUP BY 从 Oracle 数据库 table 收集数据。我认为我需要两个级别的 GROUP BY,但我不知道如何完成我的查询。

我有一个 STATUS table 有数百万这样的状态:

REQUEST    STATUS
-------    -----------
ID      -> REQUEST_ID
...        ID
           STATUS_CODE
           ....

请求流程示例(状态 table):

SELECT ... FROM STATUS WHERE REQUEST_ID = 1 ORDER BY ID;

ID      REQUEST_ID  STATUS_CODE  STATUS_ALIAS                       CREATED
1       1           201          REQUEST_SAVED
2       1           204          REQUEST_SIGNATURE_VALID
3       1           210          REQUEST_XML_VALID
4       1           280          REQUEST_ACCEPTED

5       1           310          SENT_TO_SYSTEM_1_FOR_VERIFICATION
6       1           320          SENT_TO_SYSTEM_2_FOR_VERIFICATION
7       1           521          SYSTEM_1_VERIFICATION_ERROR
8       1           511          SYSTEM_2_VERIFICATION_ERROR

24880   1           310          SENT_TO_SYSTEM_1_FOR_VERIFICATION
24881   1           320          SENT_TO_SYSTEM_2_FOR_VERIFICATION
24885   1           620          SYSTEM_1_VERIFICATION_TIMEOUT
24886   1           610          SYSTEM_2_VERIFICATION_TIMEOUT

24887   1           310          SENT_TO_SYSTEM_1_FOR_VERIFICATION
24888   1           320          SENT_TO_SYSTEM_2_FOR_VERIFICATION
.....

我想收集 REQUEST_ID 处于 VERIFICATION 状态但尚未 TIMEOUTED 的,如下所示:

24887   1           310          SENT_TO_SYSTEM_1_FOR_VERIFICATION
.....

这就是我 select 数据的方式:

SELECT REQUEST_ID, STATUS_CODE, MAX(ID) FROM STATUS
GROUP BY REQUEST_ID, STATUS_CODE HAVING STATUS_CODE = 310;

REQUEST_ID  STATUS_CODE  MAX(ID)
1           310          24887

这正确显示了 ID,我需要从那里过滤分组的状态记录 REQUEST_ID,但是当我将此查询与外部 SELECT 组合以显示 REQUEST_IDs,它不起作用。

这是我迄今为止最好的尝试:

SELECT T1.REQUEST_ID FROM STATUS T1
GROUP BY T1.REQUEST_ID, T1.ID HAVING T1.ID >= (
   SELECT MAX(ID) FROM STATUS T2
   GROUP BY T2.REQUEST_ID, T2.STATUS_CODE
   HAVING T2.STATUS_CODE IN (310, 320) AND NOT IN (610, 620)
);

ORA-01427: single-row subquery returns more than one row
01427. 00000 -  "single-row subquery returns more than one row"

更新

建议解决方案的问题如下。 让我们假设流程以这种方式继续:

24887   1           310          SENT_TO_SYSTEM_1_FOR_VERIFICATION
24888   1           320          SENT_TO_SYSTEM_2_FOR_VERIFICATION

24889   1           460          SYSTEM_2_VERIFICATION_OK
24890   1           510          SYSTEM_1_VERIFICATION_ERROR

然后如果在假设 10 分钟内没有来自系统 1 的其他响应,我只需要为系统 1 添加超时:

24891   1           620          SYSTEM_1_VERIFICATION_TIMEOUT

但只有一次。这就是查询必须过滤掉 620 的原因。否则,尽管在之前的检查中设置了超时标志,但此请求 ID 1 再次出现在结果集中 运行.

更新 2

我可以在 Java 级别编写适当的“WHERE”条件,并找到 lambda filters 处于 'stucked' 状态的请求,我需要在其中添加超时状态。但是这样我总是需要从 Java 循环遍历整个 STATUS table 并在每个 GRUOP BY REQUEST_ID 组上执行我的 java 逻辑。这很糟糕而且很耗时,会 运行 这么久,所以这个解决方案将无法正常工作。也许我需要一个存储过程?这就是为什么我想要一个“超级”SQL 查询,其中 returns 具有卡住的请求的 ID,我可以为具有这些 ID 的请求设置超时标志。

我可能会感到困惑,但我认为您需要的只是:

SELECT REQUEST_ID, STATUS_CODE, MAX(ID) 
  FROM STATUS
 WHERE STATUS_CODE IN (310, 320)
 GROUP BY REQUEST_ID, STATUS_CODE;

这个 T2.STATUS_CODE IN (310, 320) AND NOT IN (610, 620) 没有任何意义,因为当您将状态代码指定为 310/320 时,它肯定不会在 610/620 中。

HAVING T2.STATUS_CODE IN (310, 320) AND NOT IN (610, 620)中,第二个子句没有添加任何东西,就好像它在 (310,320) 中一样,它不能在 (610,620) 中。 请参阅下面的 dbFiddle link 了解架构、测试和其他查询。

SELECT 
   REQUEST_ID, 
   STATUS_CODE, 
   MAX(ID) AS MAX_ID
 FROM STATUS
 WHERE STATUS_CODE IN (310, 320)
 GROUP BY 
   REQUEST_ID, 
   STATUS_CODE;
REQUEST_ID | STATUS_CODE | MAX_ID
---------: | ----------: | -----:
         1 |         310 |  24887
         1 |         320 |  24888

db<>fiddle here

在 Oracle 中,您可以在 HAVING 子句中使用 LAST 聚合函数来按请求的最终状态进行过滤。

在所有的 DBMS 中,您可以使用 row_number() 标记最后一行,然后对其进行过滤。

假设 ID 列始终递增(或将其替换为始终递增的列),您将得到:

create table t (ID, REQUEST_ID, STATUS_CODE, STATUS_ALIAS)
as
select 1, 1, 201, 'REQUEST_SAVED' from dual union all
select 2, 1, 204, 'REQUEST_SIGNATURE_VALID' from dual union all
select 3, 1, 210, 'REQUEST_XML_VALID' from dual union all
select 4, 1, 280, 'REQUEST_ACCEPTED' from dual union all
select 5, 1, 310, 'SENT_TO_SYSTEM_1_FOR_VERIFICATION' from dual union all
select 6, 1, 320, 'SENT_TO_SYSTEM_2_FOR_VERIFICATION' from dual union all
select 7, 1, 521, 'SYSTEM_1_VERIFICATION_ERROR' from dual union all
select 8, 1, 511, 'SYSTEM_2_VERIFICATION_ERROR' from dual union all
select 24880, 1, 310, 'SENT_TO_SYSTEM_1_FOR_VERIFICATION' from dual union all
select 24881, 1, 320, 'SENT_TO_SYSTEM_2_FOR_VERIFICATION' from dual union all
select 24885, 1, 620, 'SYSTEM_1_VERIFICATION_TIMEOUT' from dual union all
select 24886, 1, 610, 'SYSTEM_2_VERIFICATION_TIMEOUT' from dual union all
select 24887, 1, 310, 'SENT_TO_SYSTEM_1_FOR_VERIFICATION' from dual union all
select 24888, 1, 320, 'SENT_TO_SYSTEM_2_FOR_VERIFICATION' from dual union all
select 30000, 2, 201, 'REQUEST_SAVED' from dual union all
select 30001, 2, 204, 'REQUEST_SIGNATURE_VALID' from dual union all
select 30002, 2, 210, 'REQUEST_XML_VALID' from dual union all
select 30003, 2, 280, 'REQUEST_ACCEPTED' from dual union all
select 30004, 2, 310, 'SENT_TO_SYSTEM_1_FOR_VERIFICATION' from dual union all
select 30005, 2, 320, 'SENT_TO_SYSTEM_2_FOR_VERIFICATION' from dual union all
select 30006, 2, 521, 'SYSTEM_1_VERIFICATION_ERROR' from dual
select
  request_id
  , max(status_alias) keep(dense_rank last order by id asc) as final_status
from t
/*To restrict input as much as possible*/
where status_code >= 310
group by request_id
having max(status_code) keep(dense_rank last order by id asc) in (310, 320)
REQUEST_ID | FINAL_STATUS                     
---------: | :--------------------------------
         1 | SENT_TO_SYSTEM_2_FOR_VERIFICATION
with a as (
  select
    t.*
    , row_number() over(
      partition by
        request_id
      order by
        id desc
    ) as rn
  from t
  where status_code >= 310
)
select *
from a
where rn = 1
  and status_code in (310, 320)
   ID | REQUEST_ID | STATUS_CODE | STATUS_ALIAS                      | RN
----: | ---------: | ----------: | :-------------------------------- | -:
24888 |          1 |         320 | SENT_TO_SYSTEM_2_FOR_VERIFICATION |  1

db<>fiddle here