SQL Challenge/Puzzle: 给定堆栈跟踪 - 如何找到每个时间点的顶部元素?

SQL Challenge/Puzzle: Given a stack trace - How to find the top element at each point in time?



谜题

A table, stack_trace ,包含以下列:

例如

(NULL 值在这里表示为空格)

数据:

i   op  val 
--  --  --  
1   I   A   
2   I   B   
3   O
4   I   C
5   O    
6   O   

要求的结果:

i   top_of_stack_val
--  ----------------
1   A
2   B
3   A
4   C
5   A
6   

要求

示例数据

create table stack_trace
(
    i       int
   ,op      char(1)
   ,val     char(1)
)
;

insert into stack_trace (i,op,val) values (1,'I','A');
insert into stack_trace (i,op,val) values (2,'I','B');
insert into stack_trace (i,op,val) values (3,'I','C');
insert into stack_trace (i,op,val) values (4,'I','D');
insert into stack_trace (i,op,val) values (5,'I','E');
insert into stack_trace (i,op)     values (6,'O');
insert into stack_trace (i,op)     values (7,'O');
insert into stack_trace (i,op)     values (8,'O');
insert into stack_trace (i,op,val) values (9,'I','F');
insert into stack_trace (i,op)     values (10,'O');
insert into stack_trace (i,op,val) values (11,'I','G');
insert into stack_trace (i,op,val) values (12,'I','H');
insert into stack_trace (i,op)     values (13,'O');
insert into stack_trace (i,op)     values (14,'O');
insert into stack_trace (i,op,val) values (15,'I','I');
insert into stack_trace (i,op,val) values (16,'I','J');
insert into stack_trace (i,op,val) values (17,'I','K');
insert into stack_trace (i,op,val) values (18,'I','L');
insert into stack_trace (i,op,val) values (19,'I','M');
insert into stack_trace (i,op)     values (20,'O');
insert into stack_trace (i,op,val) values (21,'I','N');
insert into stack_trace (i,op)     values (22,'O');
insert into stack_trace (i,op,val) values (23,'I','O');
insert into stack_trace (i,op)     values (24,'O');
insert into stack_trace (i,op,val) values (25,'I','P');
insert into stack_trace (i,op)     values (26,'O');
insert into stack_trace (i,op)     values (27,'O');
insert into stack_trace (i,op,val) values (28,'I','Q');
insert into stack_trace (i,op,val) values (29,'I','R');
insert into stack_trace (i,op)     values (30,'O');
insert into stack_trace (i,op)     values (31,'O');
insert into stack_trace (i,op)     values (32,'O');
insert into stack_trace (i,op)     values (33,'O');
insert into stack_trace (i,op)     values (34,'O');
insert into stack_trace (i,op)     values (35,'O');
insert into stack_trace (i,op,val) values (36,'I','S');
insert into stack_trace (i,op)     values (37,'O');
insert into stack_trace (i,op)     values (38,'O');
insert into stack_trace (i,op,val) values (39,'I','T');
insert into stack_trace (i,op,val) values (40,'I','U');
insert into stack_trace (i,op)     values (41,'O');
insert into stack_trace (i,op,val) values (42,'I','V');
insert into stack_trace (i,op,val) values (43,'I','W');
insert into stack_trace (i,op,val) values (44,'I','X');
insert into stack_trace (i,op)     values (45,'O');
insert into stack_trace (i,op)     values (46,'O');
insert into stack_trace (i,op,val) values (47,'I','Y');
insert into stack_trace (i,op)     values (48,'O');
insert into stack_trace (i,op)     values (49,'O');
insert into stack_trace (i,op,val) values (50,'I','Z');
insert into stack_trace (i,op)     values (51,'O');
insert into stack_trace (i,op)     values (52,'O');

要求的结果

i   top_of_stack_val
--  ----------------
1   A
2   B
3   C
4   D
5   E
6   D
7   C
8   B
9   F
10  B
11  G
12  H
13  G
14  B
15  I
16  J
17  K
18  L
19  M
20  L
21  N
22  L
23  O
24  L
25  P
26  L
27  K
28  Q
29  R
30  Q
31  K
32  J
33  I
34  B
35  A
36  S
37  A
38  
39  T
40  U
41  T
42  V
43  W
44  X
45  W
46  V
47  Y
48  V
49  T
50  Z
51  T
52  

这其实是个有趣的问题。我要做的是跟踪堆栈中每个元素的位置。您可以使用累积总和来执行此操作:

select st.*,
       sum(case when op = 'I' then 1 else -1 end) over (order by i) as pos
from stack_trace st;

唉,在这一点上,我认为您需要一个相当复杂的连接或子查询来找出 pos 引用的最新值。这是一种方法:

with st as (
      select st.*,
             sum(case when op = 'I' then 1 else -1 end) over (order by i) as pos
      from stack_trace st
     )
select st.*,
       (select val
        from st st2
        where st2.i <= st.id and st2.pos = st.pos and
              st2.val is not null
        order by i desc
        fetch first 1 row only
       ) as top_of_stack_val
from st;

这是一个很好的谜题。

由于我的主要 DBMS 是 Teradata,因此我使用分析函数(需要 TD14.10+)为其编写了一个解决方案:

SELECT dt.*,
   -- find the last item in the stack with the same position
   Last_Value(val IGNORE NULLS)
   Over (PARTITION BY pos
         ORDER BY i) AS top_of_stack_val
FROM 
 ( 
   SELECT st.*,
      -- calculate the number of items in the stack
      Sum(CASE WHEN op = 'I' THEN 1 ELSE -1 end) 
      Over (ORDER BY i
            ROWS Unbounded Preceding) AS pos
   FROM stack_trace AS st
 ) AS dt;

此解决方案也适用于 Oracle,但 PostgreSQL 和 SQL 服务器不支持 LAST_VALUEIGNORE NULLS 选项并且模拟它非常复杂,例如参见 Itzk Ben-Gan 的 The Last non NULL Puzzle

编辑:事实上并没有那么复杂,我忘记了 Itzik 的第二个解决方案,旧的搭载技巧 ;-)

Martin Smith 的方法适用于所有四种 DBMS。

我个人怀疑您最终是否会发现 SQL 可以在所有 SQL 服务器、Teradata、Postgres 和 Oracle 中使用,并且具有可接受的 table 性能如果 table 很大。

一个SQL服务器解决方案(demo)如下

SELECT i,
       SUBSTRING(MAX(FORMAT(i, 'D10') + val) OVER (PARTITION BY Pos ORDER BY i 
                                                     ROWS UNBOUNDED PRECEDING), 11, 8000)
FROM   (SELECT st.*,
               sum(CASE WHEN op = 'I' THEN 1 ELSE -1 END) 
                  OVER (ORDER BY i ROWS UNBOUNDED PRECEDING) AS pos
        FROM   stack_trace st) t1
ORDER  BY i; 

虽然它确实需要一个额外的步骤 -

HiveTeradataOracle[的通用解决方案=25=]服务器PostgreSQL

select      s.i

           ,min (s.val) over
            (
                partition by    s.depth
                               ,s.depth_val_seq
            )                                   as top_of_stack_val            

from       (select      s.i
                       ,s.val
                       ,s.depth
                        
                       ,count (s.val) over 
                        (
                            partition by    s.depth
                            order by        s.i
                            rows            between unbounded preceding and current row
                        )                                                                   as depth_val_seq
                         
            from       (select      s.i
                                   ,s.val
                                    
                                   ,sum (case s.op when 'I' then 1 else -1 end)   over
                                    (
                                        order by        s.i
                                        rows            between unbounded preceding and current row
                                    )                                                                   as depth
             
                        from        stack_trace s
                        )
                        s
            )
            s
 
order by    i
;

天睿

select      s.i

           ,first_value (s.val) over 
            (
                partition by    s.depth
                order by        s.i
                reset when      s.op = 'I'
            )                                as top_of_stack_val

from       (select      s.i
                       ,s.val
                       ,s.op

                       ,sum (case s.op when 'I' then 1 else -1 end)   over
                        (
                            order by        s.i
                            rows            unbounded preceding
                        )                                                   as depth

            from        stack_trace s
            )
            s
            
order by    i
;