以分段重复值集的方式对项目进行分组

Group items in a way that sets of repeating values are segmented

我有 table 患者住院的到达和离开:

ENC_ID    ARRIVE                  DEPART                  UNIT      LEVEL
123456789 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU       TRAUMA
123456789 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP   TRAUMA
123456789 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU       TRAUMA
123456789 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5    NULL
123456789 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU       TRAUMA
123456789 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP  TRAUMA
123456789 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5    NULL

我需要根据 LEVEL:

组合并记录
ENC_ID    MIN(ARRIVE)         MAX(DEPART)         LEVEL
123456789 2018-07-16 17:53:00 2018-07-18 09:06:00 TRAUMA
123456789 2018-07-18 23:14:00 2018-07-23 07:33:00 NULL
123456789 2018-07-23 07:33:00 2018-07-30 11:06:00 TRAUMA
123456789 2018-07-30 11:06:00 2018-07-31 11:06:00 NULL

我希望使用 DENSE_RANK 为每个 LEVEL 集合创建一个 SEQ 数字,我以后可以将其用于 GROUP BY:

ENC_ID    ARRIVED                 DEPARTED                UNIT     LEVEL   SEQ
159939879 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU      TRAUMA  1
159939879 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP  TRAUMA  1
159939879 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU      TRAUMA  1
159939879 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5   NULL    2
159939879 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU      TRAUMA  3
159939879 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA  3
159939879 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5    NULL   4

但是 DENSE_RANK() over (partition by ENC_ID order by LEVEL) 它没有以我可以使用的方式区分 LEVEL 集合:

ENC_ID    ARRIVED                 DEPARTED                UNIT     LEVEL   DR
159939879 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU      TRAUMA  2
159939879 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP  TRAUMA  2
159939879 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU      TRAUMA  2
159939879 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5   NULL    1
159939879 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU      TRAUMA  2
159939879 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA  2
159939879 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5    NULL   1

有办法实现吗?

这是一个缺口和孤岛问题。你想知道有多少变化。您可以使用 lag() 和累计总和:

select t.*,
       sum(case when prev_level = level then 0 else 1 end) over (partition by enc_id order by arrived)
from (select t.*,
             lag(level) over (partition by ENC_ID order by ARRIVED) as prev_level
      from t
     ) t;

然后您可以按此值进行汇总。

请注意,以上内容可能无法按您的意愿处理相邻的 NULL 值。因此,行号的差异可能会更好:

select enc_id, min(arrive), max(depart), level,
       row_number() over (order by min(arrived))
from (select t.*,
             row_number() over (partition by ENC_ID, level order by ARRIVED) as seqnum_l,
             row_number() over (partition by ENC_ID order by ARRIVED) as seqnum
      from t
     ) t
group by enc_id, (seqnum - seqnum_l), level;