以分段重复值集的方式对项目进行分组
Group items in a way that sets of repeating values are segmented
我有 table 患者住院的到达和离开:
ENC_ID ARRIVE DEPART UNIT LEVEL
123456789 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU TRAUMA
123456789 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP TRAUMA
123456789 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU TRAUMA
123456789 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5 NULL
123456789 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU TRAUMA
123456789 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA
123456789 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5 NULL
我需要根据 LEVEL
:
组合并记录
ENC_ID MIN(ARRIVE) MAX(DEPART) LEVEL
123456789 2018-07-16 17:53:00 2018-07-18 09:06:00 TRAUMA
123456789 2018-07-18 23:14:00 2018-07-23 07:33:00 NULL
123456789 2018-07-23 07:33:00 2018-07-30 11:06:00 TRAUMA
123456789 2018-07-30 11:06:00 2018-07-31 11:06:00 NULL
我希望使用 DENSE_RANK
为每个 LEVEL
集合创建一个 SEQ
数字,我以后可以将其用于 GROUP BY
:
ENC_ID ARRIVED DEPARTED UNIT LEVEL SEQ
159939879 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU TRAUMA 1
159939879 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP TRAUMA 1
159939879 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU TRAUMA 1
159939879 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5 NULL 2
159939879 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU TRAUMA 3
159939879 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA 3
159939879 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5 NULL 4
但是 DENSE_RANK() over (partition by ENC_ID order by LEVEL)
它没有以我可以使用的方式区分 LEVEL
集合:
ENC_ID ARRIVED DEPARTED UNIT LEVEL DR
159939879 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU TRAUMA 2
159939879 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP TRAUMA 2
159939879 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU TRAUMA 2
159939879 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5 NULL 1
159939879 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU TRAUMA 2
159939879 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA 2
159939879 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5 NULL 1
有办法实现吗?
这是一个缺口和孤岛问题。你想知道有多少变化。您可以使用 lag()
和累计总和:
select t.*,
sum(case when prev_level = level then 0 else 1 end) over (partition by enc_id order by arrived)
from (select t.*,
lag(level) over (partition by ENC_ID order by ARRIVED) as prev_level
from t
) t;
然后您可以按此值进行汇总。
请注意,以上内容可能无法按您的意愿处理相邻的 NULL
值。因此,行号的差异可能会更好:
select enc_id, min(arrive), max(depart), level,
row_number() over (order by min(arrived))
from (select t.*,
row_number() over (partition by ENC_ID, level order by ARRIVED) as seqnum_l,
row_number() over (partition by ENC_ID order by ARRIVED) as seqnum
from t
) t
group by enc_id, (seqnum - seqnum_l), level;
我有 table 患者住院的到达和离开:
ENC_ID ARRIVE DEPART UNIT LEVEL
123456789 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU TRAUMA
123456789 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP TRAUMA
123456789 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU TRAUMA
123456789 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5 NULL
123456789 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU TRAUMA
123456789 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA
123456789 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5 NULL
我需要根据 LEVEL
:
ENC_ID MIN(ARRIVE) MAX(DEPART) LEVEL
123456789 2018-07-16 17:53:00 2018-07-18 09:06:00 TRAUMA
123456789 2018-07-18 23:14:00 2018-07-23 07:33:00 NULL
123456789 2018-07-23 07:33:00 2018-07-30 11:06:00 TRAUMA
123456789 2018-07-30 11:06:00 2018-07-31 11:06:00 NULL
我希望使用 DENSE_RANK
为每个 LEVEL
集合创建一个 SEQ
数字,我以后可以将其用于 GROUP BY
:
ENC_ID ARRIVED DEPARTED UNIT LEVEL SEQ
159939879 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU TRAUMA 1
159939879 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP TRAUMA 1
159939879 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU TRAUMA 1
159939879 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5 NULL 2
159939879 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU TRAUMA 3
159939879 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA 3
159939879 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5 NULL 4
但是 DENSE_RANK() over (partition by ENC_ID order by LEVEL)
它没有以我可以使用的方式区分 LEVEL
集合:
ENC_ID ARRIVED DEPARTED UNIT LEVEL DR
159939879 2018-07-16 17:53:00.000 2018-07-17 06:30:00.000 ICU TRAUMA 2
159939879 2018-07-17 06:30:00.000 2018-07-17 09:05:00.000 PERI OP TRAUMA 2
159939879 2018-07-17 09:05:00.000 2018-07-18 09:06:00.000 ICU TRAUMA 2
159939879 2018-07-18 23:14:00.000 2018-07-23 07:33:00.000 UNIT 5 NULL 1
159939879 2018-07-23 07:33:00.000 2018-07-23 14:57:00.000 ICU TRAUMA 2
159939879 2018-07-23 14:57:00.000 2018-07-30 11:06:00.000 INTRA OP TRAUMA 2
159939879 2018-07-30 11:06:00.000 2018-07-31 11:06:00.000 UNIT 5 NULL 1
有办法实现吗?
这是一个缺口和孤岛问题。你想知道有多少变化。您可以使用 lag()
和累计总和:
select t.*,
sum(case when prev_level = level then 0 else 1 end) over (partition by enc_id order by arrived)
from (select t.*,
lag(level) over (partition by ENC_ID order by ARRIVED) as prev_level
from t
) t;
然后您可以按此值进行汇总。
请注意,以上内容可能无法按您的意愿处理相邻的 NULL
值。因此,行号的差异可能会更好:
select enc_id, min(arrive), max(depart), level,
row_number() over (order by min(arrived))
from (select t.*,
row_number() over (partition by ENC_ID, level order by ARRIVED) as seqnum_l,
row_number() over (partition by ENC_ID order by ARRIVED) as seqnum
from t
) t
group by enc_id, (seqnum - seqnum_l), level;