合并日期时间范围 Oracle SQL 或 PL/SQL
Merge Datetime Ranges Oracle SQL or PL/SQL
我一直在努力合并 oracle SQL 或 PL/SQL(数据库标准版 11gR2)中的日期时间范围。
我正在尝试合并日期时间范围,以便以下数据
order_id start_date_time end_date_time
3933 04/02/2020 08:00:00 04/02/2020 12:00:00
3933 04/02/2020 13:30:00 04/02/2020 17:00:00
3933 04/02/2020 14:00:00 04/02/2020 19:00:00
3933 05/02/2020 13:40:12 05/02/2020 14:34:48
3933 05/02/2020 14:00:00 05/02/2020 18:55:12
3933 05/02/2020 14:49:48 05/02/2020 15:04:48
3933 06/02/2020 08:00:00 06/02/2020 12:00:00
3933 06/02/2020 13:30:00 06/02/2020 17:00:00
3933 06/02/2020 14:10:12 06/02/2020 18:49:48
3933 07/02/2020 08:00:00 07/02/2020 10:30:00
3933 07/02/2020 08:00:00 07/02/2020 12:00:00
3933 07/02/2020 13:30:00 07/02/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 14/05/2020 15:00:00 14/05/2020 16:30:00
11919 15/05/2020 08:40:12 15/05/2020 16:30:00
11919 15/05/2020 09:40:12 15/05/2020 16:30:00
11919 15/05/2020 10:15:00 15/05/2020 12:15:00
11919 15/05/2020 13:19:48 15/05/2020 16:00:00
11919 18/05/2020 08:49:48 18/05/2020 09:45:00
11919 18/05/2020 10:00:00 18/05/2020 17:00:00
11919 18/05/2020 10:00:00 18/05/2020 16:58:12
11919 18/05/2020 15:34:48 18/05/2020 16:10:12
11919 18/05/2020 16:30:00 18/05/2020 16:45:00
... ... ...
会转化为如下结果集
--after merge (this is the result I am seeking)
order_id start_date_time end_date_time
3933 04/02/2020 08:00:00 04/02/2020 12:00:00
3933 04/02/2020 13:30:00 04/02/2020 19:00:00
3933 05/02/2020 13:40:12 05/02/2020 18:55:12
3933 06/02/2020 08:00:00 06/02/2020 12:00:00
3933 06/02/2020 13:30:00 06/02/2020 18:49:48
3933 07/02/2020 08:00:00 07/02/2020 12:00:00
3933 07/02/2020 13:30:00 07/02/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 15/05/2020 08:40:12 15/05/2020 16:30:00
11919 18/05/2020 08:49:48 18/05/2020 17:00:00
... ... ...
start_date_time和end_date_time的格式为DAY/MONTH/YEARHH24:MI:SS.
关于如何在 Oracle SQL 或 PL/SQL 中进行合并的任何 suggestion/solution?
我认为这是一个微不足道的问题,但是我还没能在互联网上找到解决方案。
提前致谢。
改编自 ,其中包含对代码的解释。所有改变的是添加 PARTITION BY order_id
来计算每个 order_id
的日期范围,然后添加到 return 范围(而不是根据链接的答案计算总值):
SELECT order_id,
start_date_time,
end_date_time
FROM (
SELECT order_id,
LAG( dt ) OVER ( PARTITION BY order_id ORDER BY dt ) AS start_date_time,
dt AS end_date_time,
start_end
FROM (
SELECT order_id,
dt,
CASE SUM( value ) OVER ( PARTITION BY order_id ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM table_name
UNPIVOT ( dt FOR value IN ( start_date_time AS 1, end_date_time AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end';
从 Oracle 12 开始,您可以使用 MATCH_RECONIZE
进行逐行处理:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY order_id
ORDER BY start_date_time
MEASURES
FIRST(start_date_time) AS start_date_time,
MAX(end_date_time) AS end_date_time
ONE ROW PER MATCH
PATTERN (overlapping_rows* last_row)
DEFINE
overlapping_rows AS NEXT(start_date_time) <= MAX(end_date_time)
)
其中,对于你的测试数据:
CREATE TABLE table_name (
order_id NUMBER,
start_date_time DATE,
end_date_time DATE
);
INSERT INTO table_name ( order_id, start_date_time, end_date_time )
SELECT 3933, TIMESTAMP '2020-02-04 08:00:00', TIMESTAMP '2020-02-04 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 13:30:00', TIMESTAMP '2020-02-04 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 14:00:00', TIMESTAMP '2020-02-04 19:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 13:40:12', TIMESTAMP '2020-02-05 14:34:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:00:00', TIMESTAMP '2020-02-05 18:55:12' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:49:48', TIMESTAMP '2020-02-05 15:04:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 08:00:00', TIMESTAMP '2020-02-06 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 13:30:00', TIMESTAMP '2020-02-06 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 14:10:12', TIMESTAMP '2020-02-06 18:49:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 10:30:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 13:30:00', TIMESTAMP '2020-02-07 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 15:00:00', TIMESTAMP '2020-05-14 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 08:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 09:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 10:15:00', TIMESTAMP '2020-05-15 12:15:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 13:19:48', TIMESTAMP '2020-05-15 16:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 08:49:48', TIMESTAMP '2020-05-18 09:45:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 16:58:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 15:34:48', TIMESTAMP '2020-05-18 16:10:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 16:30:00', TIMESTAMP '2020-05-18 16:45:00' FROM DUAL;
这两个输出:
ORDER_ID | START_DATE_TIME | END_DATE_TIME
-------: | :------------------ | :------------------
3933 | 2020-02-04 08:00:00 | 2020-02-04 12:00:00
3933 | 2020-02-04 13:30:00 | 2020-02-04 19:00:00
3933 | 2020-02-05 13:40:12 | 2020-02-05 18:55:12
3933 | 2020-02-06 08:00:00 | 2020-02-06 12:00:00
3933 | 2020-02-06 13:30:00 | 2020-02-06 18:49:48
3933 | 2020-02-07 08:00:00 | 2020-02-07 12:00:00
3933 | 2020-02-07 13:30:00 | 2020-02-07 17:00:00
11919 | 2020-05-14 09:00:00 | 2020-05-14 17:00:00
11919 | 2020-05-15 08:40:12 | 2020-05-15 16:30:00
11919 | 2020-05-18 08:49:48 | 2020-05-18 09:45:00
11919 | 2020-05-18 10:00:00 | 2020-05-18 17:00:00
db<>fiddle here
下面的解决方案使用一种称为 "start of group" 方法的常用方法。
想法是按开始日期(分别为每个 id)对间隔进行排序,并将间隔分配给组,如下所示。对于每个间隔,检查其开始时间是否严格大于所有先前间隔的结束时间的最大值。如果是,则开始一个新组。剩下的很简单 - 只需 select 每个组的 MIN 开始日期和 MAX 结束日期。
这是使用解析函数实现的方法:
with
has_sog_flags (order_id, start_date_time, end_date_time, flag) as (
select order_id, start_date_time, end_date_time,
case when start_date_time >
max(end_date_time) over (partition by order_id
order by start_date_time
rows between unbounded preceding and 1 preceding)
then 1 end
from table_name
)
, has_groups (order_id, start_date_time, end_date_time, grp) as (
select order_id, start_date_time, end_date_time,
sum(flag) over (partition by order_id order by start_date_time)
from has_sog_flags
)
select order_id, min(start_date_time) as start_date_time,
max(end_date_time) as end_date_time
from has_groups
group by order_id, grp
order by order_id, start_date_time
;
一个有趣的问题是如何处理开放式区间(例如 null
for end_date_time 表示 "open ended into the future"。可以相对轻松地调整查询以涵盖此类扩展问题陈述。
我一直在努力合并 oracle SQL 或 PL/SQL(数据库标准版 11gR2)中的日期时间范围。
我正在尝试合并日期时间范围,以便以下数据
order_id start_date_time end_date_time
3933 04/02/2020 08:00:00 04/02/2020 12:00:00
3933 04/02/2020 13:30:00 04/02/2020 17:00:00
3933 04/02/2020 14:00:00 04/02/2020 19:00:00
3933 05/02/2020 13:40:12 05/02/2020 14:34:48
3933 05/02/2020 14:00:00 05/02/2020 18:55:12
3933 05/02/2020 14:49:48 05/02/2020 15:04:48
3933 06/02/2020 08:00:00 06/02/2020 12:00:00
3933 06/02/2020 13:30:00 06/02/2020 17:00:00
3933 06/02/2020 14:10:12 06/02/2020 18:49:48
3933 07/02/2020 08:00:00 07/02/2020 10:30:00
3933 07/02/2020 08:00:00 07/02/2020 12:00:00
3933 07/02/2020 13:30:00 07/02/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 14/05/2020 15:00:00 14/05/2020 16:30:00
11919 15/05/2020 08:40:12 15/05/2020 16:30:00
11919 15/05/2020 09:40:12 15/05/2020 16:30:00
11919 15/05/2020 10:15:00 15/05/2020 12:15:00
11919 15/05/2020 13:19:48 15/05/2020 16:00:00
11919 18/05/2020 08:49:48 18/05/2020 09:45:00
11919 18/05/2020 10:00:00 18/05/2020 17:00:00
11919 18/05/2020 10:00:00 18/05/2020 16:58:12
11919 18/05/2020 15:34:48 18/05/2020 16:10:12
11919 18/05/2020 16:30:00 18/05/2020 16:45:00
... ... ...
会转化为如下结果集
--after merge (this is the result I am seeking)
order_id start_date_time end_date_time
3933 04/02/2020 08:00:00 04/02/2020 12:00:00
3933 04/02/2020 13:30:00 04/02/2020 19:00:00
3933 05/02/2020 13:40:12 05/02/2020 18:55:12
3933 06/02/2020 08:00:00 06/02/2020 12:00:00
3933 06/02/2020 13:30:00 06/02/2020 18:49:48
3933 07/02/2020 08:00:00 07/02/2020 12:00:00
3933 07/02/2020 13:30:00 07/02/2020 17:00:00
11919 14/05/2020 09:00:00 14/05/2020 17:00:00
11919 15/05/2020 08:40:12 15/05/2020 16:30:00
11919 18/05/2020 08:49:48 18/05/2020 17:00:00
... ... ...
start_date_time和end_date_time的格式为DAY/MONTH/YEARHH24:MI:SS.
关于如何在 Oracle SQL 或 PL/SQL 中进行合并的任何 suggestion/solution?
我认为这是一个微不足道的问题,但是我还没能在互联网上找到解决方案。
提前致谢。
改编自 PARTITION BY order_id
来计算每个 order_id
的日期范围,然后添加到 return 范围(而不是根据链接的答案计算总值):
SELECT order_id,
start_date_time,
end_date_time
FROM (
SELECT order_id,
LAG( dt ) OVER ( PARTITION BY order_id ORDER BY dt ) AS start_date_time,
dt AS end_date_time,
start_end
FROM (
SELECT order_id,
dt,
CASE SUM( value ) OVER ( PARTITION BY order_id ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM table_name
UNPIVOT ( dt FOR value IN ( start_date_time AS 1, end_date_time AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end';
从 Oracle 12 开始,您可以使用 MATCH_RECONIZE
进行逐行处理:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY order_id
ORDER BY start_date_time
MEASURES
FIRST(start_date_time) AS start_date_time,
MAX(end_date_time) AS end_date_time
ONE ROW PER MATCH
PATTERN (overlapping_rows* last_row)
DEFINE
overlapping_rows AS NEXT(start_date_time) <= MAX(end_date_time)
)
其中,对于你的测试数据:
CREATE TABLE table_name (
order_id NUMBER,
start_date_time DATE,
end_date_time DATE
);
INSERT INTO table_name ( order_id, start_date_time, end_date_time )
SELECT 3933, TIMESTAMP '2020-02-04 08:00:00', TIMESTAMP '2020-02-04 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 13:30:00', TIMESTAMP '2020-02-04 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-04 14:00:00', TIMESTAMP '2020-02-04 19:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 13:40:12', TIMESTAMP '2020-02-05 14:34:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:00:00', TIMESTAMP '2020-02-05 18:55:12' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-05 14:49:48', TIMESTAMP '2020-02-05 15:04:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 08:00:00', TIMESTAMP '2020-02-06 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 13:30:00', TIMESTAMP '2020-02-06 17:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-06 14:10:12', TIMESTAMP '2020-02-06 18:49:48' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 10:30:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 08:00:00', TIMESTAMP '2020-02-07 12:00:00' FROM DUAL UNION ALL
SELECT 3933, TIMESTAMP '2020-02-07 13:30:00', TIMESTAMP '2020-02-07 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 09:00:00', TIMESTAMP '2020-05-14 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-14 15:00:00', TIMESTAMP '2020-05-14 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 08:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 09:40:12', TIMESTAMP '2020-05-15 16:30:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 10:15:00', TIMESTAMP '2020-05-15 12:15:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-15 13:19:48', TIMESTAMP '2020-05-15 16:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 08:49:48', TIMESTAMP '2020-05-18 09:45:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 17:00:00' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 10:00:00', TIMESTAMP '2020-05-18 16:58:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 15:34:48', TIMESTAMP '2020-05-18 16:10:12' FROM DUAL UNION ALL
SELECT 11919, TIMESTAMP '2020-05-18 16:30:00', TIMESTAMP '2020-05-18 16:45:00' FROM DUAL;
这两个输出:
ORDER_ID | START_DATE_TIME | END_DATE_TIME -------: | :------------------ | :------------------ 3933 | 2020-02-04 08:00:00 | 2020-02-04 12:00:00 3933 | 2020-02-04 13:30:00 | 2020-02-04 19:00:00 3933 | 2020-02-05 13:40:12 | 2020-02-05 18:55:12 3933 | 2020-02-06 08:00:00 | 2020-02-06 12:00:00 3933 | 2020-02-06 13:30:00 | 2020-02-06 18:49:48 3933 | 2020-02-07 08:00:00 | 2020-02-07 12:00:00 3933 | 2020-02-07 13:30:00 | 2020-02-07 17:00:00 11919 | 2020-05-14 09:00:00 | 2020-05-14 17:00:00 11919 | 2020-05-15 08:40:12 | 2020-05-15 16:30:00 11919 | 2020-05-18 08:49:48 | 2020-05-18 09:45:00 11919 | 2020-05-18 10:00:00 | 2020-05-18 17:00:00
db<>fiddle here
下面的解决方案使用一种称为 "start of group" 方法的常用方法。
想法是按开始日期(分别为每个 id)对间隔进行排序,并将间隔分配给组,如下所示。对于每个间隔,检查其开始时间是否严格大于所有先前间隔的结束时间的最大值。如果是,则开始一个新组。剩下的很简单 - 只需 select 每个组的 MIN 开始日期和 MAX 结束日期。
这是使用解析函数实现的方法:
with
has_sog_flags (order_id, start_date_time, end_date_time, flag) as (
select order_id, start_date_time, end_date_time,
case when start_date_time >
max(end_date_time) over (partition by order_id
order by start_date_time
rows between unbounded preceding and 1 preceding)
then 1 end
from table_name
)
, has_groups (order_id, start_date_time, end_date_time, grp) as (
select order_id, start_date_time, end_date_time,
sum(flag) over (partition by order_id order by start_date_time)
from has_sog_flags
)
select order_id, min(start_date_time) as start_date_time,
max(end_date_time) as end_date_time
from has_groups
group by order_id, grp
order by order_id, start_date_time
;
一个有趣的问题是如何处理开放式区间(例如 null
for end_date_time 表示 "open ended into the future"。可以相对轻松地调整查询以涵盖此类扩展问题陈述。