计算日期范围内的天数,同时排除重叠天数
Count days within dateranges while excluding overlapping days
我正在寻找几个日期范围内的天数。我使用 datediff 函数对天数求和,但现在我想排除重叠的天数。因此,从最早的日期到 curdate,我希望在日期范围内有天数,如果它在重叠范围内,每天只计算一次。
我的 table 看起来像这样:
Person_id | Start_date | End_date | Count
83244 2014-09-01 00:00:00 2014-09-06 00:00:00 5
83244 2014-09-08 00:00:00 2015-09-07 00:00:00 364
83244 2015-01-15 00:00:00 2015-02-01 00:00:00 17
如果我对这个求和,我会得到 382,但我正在寻找的答案是 369。因为最后一行与第二行完全重叠。
有人有解决办法吗?
我用第二个 Person_id
填充了您的示例,并稍微缩短了列名以使代码更短:
CREATE TABLE tbl(`pid` int, `sd` datetime, `ed` datetime);
INSERT INTO tbl (`pid`, `sd`, `ed`)
VALUES
(83244, '2014-09-01', '2014-09-06'),
(83244, '2014-09-08', '2015-09-07'),
(83243, '2014-08-08', '2015-08-15'),
(83243, '2014-08-11', '2015-09-03'),
(83244, '2015-01-15', '2015-02-01');
因此,在处理上述数据时,您可以应用以下查询:
SELECT pid,sd,ed,CASE WHEN @id!=pid THEN @id:=pid+0*(@ed:=Date('1970-1-1')) END id,
CASE WHEN sd<@ed THEN CASE WHEN ed>@ed THEN datediff(ed,@ed) ELSE 0 END
ELSE datediff(ed,sd) END days,
@ed:=CASE WHEN ed>@ed THEN ed ELSE @ed END enddt
FROM tbl,( select @id:=0 ) const
ORDER BY pid,sd
与其他 RDBMS 相反,MySql 在涉及 select
语句时有一定的 "procedural feel"。您实际上可以在其中使用变量(@id
和 @ed
),这些变量会随时间改变它们的状态(在这种情况下,末尾的 order by
子句非常重要)。
此查询背后的基本思想是:从某个 pid
开始并按照开始日期 (sd
) 递增的顺序列出间隔。始终记住变量 @ed
中结束日期 (ed
) 的最大值。现在,对于每个新间隔,检查是否与前一个间隔重叠,即。 e.检查当前开始日期 sd
是否小于前一个(最大)结束日期 (@ed
) 并相应地计算间隔 days
。
每当当前 pid
发生变化时,第一个 case
子句是重置变量 @id
和 @ed
所必需的。
子查询const
只是在开头设置变量@id
。
查询产生以下结果:
pid sd ed id days enddt
83243 2014-08-08 00:00:00 2015-08-15 00:00:00 83243 372 2015-08-15 00:00:00
83243 2014-08-11 00:00:00 2015-09-03 00:00:00 19 2015-09-03 00:00:00
83244 2014-09-01 00:00:00 2014-09-06 00:00:00 83244 5 2014-09-06 00:00:00
83244 2014-09-08 00:00:00 2015-09-07 00:00:00 364 2015-09-07 00:00:00
83244 2015-01-15 00:00:00 2015-02-01 00:00:00 0 2015-09-07 00:00:00
请在此处查看 Demo。
如果您只对总金额感兴趣,您当然可以将整个查询包装在另一个 group
ing 中,如下所示:
SELECT pid,sum(days) FROM (
SELECT pid,sd,ed,CASE WHEN @id!=pid THEN @id:=pid+0*(@ed:=Date('1970-1-1')) END id,
CASE WHEN sd<@ed THEN CASE WHEN ed>@ed THEN datediff(ed,@ed) ELSE 0 END
ELSE datediff(ed,sd) END days,
@ed:=CASE WHEN ed>@ed THEN ed ELSE @ed END enddt
FROM tbl,( select @id:=0 ) const
ORDER BY pid,sd
) t GROUP BY pid ORDER BY pid
这会得到你
pid sum(days)
83243 391
83244 369
这 SQL 将 return 不计算重叠的天数加倍:
select person_id, sum(days)
from (
select t1.person_id,
t1.start_date,
t1.end_date,
case when t1.end_date > coalesce(greatest(max(t2.end_date), t1.start_date), t1.start_date)
then datediff(t1.end_date, coalesce(greatest(max(t2.end_date), t1.start_date), t1.start_date))
else 0
end days
from t t1
left join t t2 on t1.person_id = t2.person_id
and (t2.start_date < t1.start_date
or t2.start_date = t1.start_date and t2.end_date < t1.end_date)
group by t1.person_id,
t1.start_date,
t1.end_date
) detail
group by person_id
要求给定的人的时期是唯一的,因此没有两个时期 start_date 与 end_date.
此fiddlereturns 369为样本数据和人物。
备选
您可以创建一个序列 table(这对许多用途都很有用),然后用它来计算唯一天数。
因此,作为一次性操作,您可以使用仅包含自然数 (0, 1, 2 ... ) 的附加 table 扩展您的数据库模型:
create table sequence (
num int,
primary key (num)
);
// Populate the above table with as many numbers as needed:
insert into sequence values(0);
insert into sequence select num+ 1 from sequence; -- 2 records
insert into sequence select num+ 2 from sequence; -- 4 records
insert into sequence select num+ 4 from sequence; -- 8 records
insert into sequence select num+ 8 from sequence; -- 16 records
insert into sequence select num+ 16 from sequence; -- 32 records
insert into sequence select num+ 32 from sequence; -- 64 records
insert into sequence select num+ 64 from sequence; -- 128 records
insert into sequence select num+ 128 from sequence; -- 256 records
insert into sequence select num+ 256 from sequence; -- 512 records
insert into sequence select num+ 512 from sequence; -- 1024 records
insert into sequence select num+1024 from sequence; -- 2048 records
insert into sequence select num+2048 from sequence; -- 4096 records
您可以继续这样插入记录,但对于当前任务来说,这已经足够了。
现在进入实际解决方案:
select person_id, count(distinct num), count(num)
from sequence
cross join (select min(start_date) min_date,
max(end_date) max_date
from t) stats
inner join t
on date_add(min_date, interval (num*24+12) hour)
between start_date and end_date
where num < datediff(max_date, min_date)
group by person_id
此查询使用唯一数字来获取从最早开始日期开始的天数,并在这些日期处于句点时包括这些日期。然后计算满足该条件的唯一日期。
where
子句是可选的,但会加快查询速度。
这里是fiddle。它产生这个结果:
| Person_id | count(distinct num) | count(num) |
|-----------|---------------------|------------|
| 83244 | 369 | 386 |
我正在寻找几个日期范围内的天数。我使用 datediff 函数对天数求和,但现在我想排除重叠的天数。因此,从最早的日期到 curdate,我希望在日期范围内有天数,如果它在重叠范围内,每天只计算一次。
我的 table 看起来像这样:
Person_id | Start_date | End_date | Count
83244 2014-09-01 00:00:00 2014-09-06 00:00:00 5
83244 2014-09-08 00:00:00 2015-09-07 00:00:00 364
83244 2015-01-15 00:00:00 2015-02-01 00:00:00 17
如果我对这个求和,我会得到 382,但我正在寻找的答案是 369。因为最后一行与第二行完全重叠。
有人有解决办法吗?
我用第二个 Person_id
填充了您的示例,并稍微缩短了列名以使代码更短:
CREATE TABLE tbl(`pid` int, `sd` datetime, `ed` datetime);
INSERT INTO tbl (`pid`, `sd`, `ed`)
VALUES
(83244, '2014-09-01', '2014-09-06'),
(83244, '2014-09-08', '2015-09-07'),
(83243, '2014-08-08', '2015-08-15'),
(83243, '2014-08-11', '2015-09-03'),
(83244, '2015-01-15', '2015-02-01');
因此,在处理上述数据时,您可以应用以下查询:
SELECT pid,sd,ed,CASE WHEN @id!=pid THEN @id:=pid+0*(@ed:=Date('1970-1-1')) END id,
CASE WHEN sd<@ed THEN CASE WHEN ed>@ed THEN datediff(ed,@ed) ELSE 0 END
ELSE datediff(ed,sd) END days,
@ed:=CASE WHEN ed>@ed THEN ed ELSE @ed END enddt
FROM tbl,( select @id:=0 ) const
ORDER BY pid,sd
与其他 RDBMS 相反,MySql 在涉及 select
语句时有一定的 "procedural feel"。您实际上可以在其中使用变量(@id
和 @ed
),这些变量会随时间改变它们的状态(在这种情况下,末尾的 order by
子句非常重要)。
此查询背后的基本思想是:从某个 pid
开始并按照开始日期 (sd
) 递增的顺序列出间隔。始终记住变量 @ed
中结束日期 (ed
) 的最大值。现在,对于每个新间隔,检查是否与前一个间隔重叠,即。 e.检查当前开始日期 sd
是否小于前一个(最大)结束日期 (@ed
) 并相应地计算间隔 days
。
每当当前 pid
发生变化时,第一个 case
子句是重置变量 @id
和 @ed
所必需的。
子查询const
只是在开头设置变量@id
。
查询产生以下结果:
pid sd ed id days enddt
83243 2014-08-08 00:00:00 2015-08-15 00:00:00 83243 372 2015-08-15 00:00:00
83243 2014-08-11 00:00:00 2015-09-03 00:00:00 19 2015-09-03 00:00:00
83244 2014-09-01 00:00:00 2014-09-06 00:00:00 83244 5 2014-09-06 00:00:00
83244 2014-09-08 00:00:00 2015-09-07 00:00:00 364 2015-09-07 00:00:00
83244 2015-01-15 00:00:00 2015-02-01 00:00:00 0 2015-09-07 00:00:00
请在此处查看 Demo。
如果您只对总金额感兴趣,您当然可以将整个查询包装在另一个 group
ing 中,如下所示:
SELECT pid,sum(days) FROM (
SELECT pid,sd,ed,CASE WHEN @id!=pid THEN @id:=pid+0*(@ed:=Date('1970-1-1')) END id,
CASE WHEN sd<@ed THEN CASE WHEN ed>@ed THEN datediff(ed,@ed) ELSE 0 END
ELSE datediff(ed,sd) END days,
@ed:=CASE WHEN ed>@ed THEN ed ELSE @ed END enddt
FROM tbl,( select @id:=0 ) const
ORDER BY pid,sd
) t GROUP BY pid ORDER BY pid
这会得到你
pid sum(days)
83243 391
83244 369
这 SQL 将 return 不计算重叠的天数加倍:
select person_id, sum(days)
from (
select t1.person_id,
t1.start_date,
t1.end_date,
case when t1.end_date > coalesce(greatest(max(t2.end_date), t1.start_date), t1.start_date)
then datediff(t1.end_date, coalesce(greatest(max(t2.end_date), t1.start_date), t1.start_date))
else 0
end days
from t t1
left join t t2 on t1.person_id = t2.person_id
and (t2.start_date < t1.start_date
or t2.start_date = t1.start_date and t2.end_date < t1.end_date)
group by t1.person_id,
t1.start_date,
t1.end_date
) detail
group by person_id
要求给定的人的时期是唯一的,因此没有两个时期 start_date 与 end_date.
此fiddlereturns 369为样本数据和人物。
备选
您可以创建一个序列 table(这对许多用途都很有用),然后用它来计算唯一天数。
因此,作为一次性操作,您可以使用仅包含自然数 (0, 1, 2 ... ) 的附加 table 扩展您的数据库模型:
create table sequence (
num int,
primary key (num)
);
// Populate the above table with as many numbers as needed:
insert into sequence values(0);
insert into sequence select num+ 1 from sequence; -- 2 records
insert into sequence select num+ 2 from sequence; -- 4 records
insert into sequence select num+ 4 from sequence; -- 8 records
insert into sequence select num+ 8 from sequence; -- 16 records
insert into sequence select num+ 16 from sequence; -- 32 records
insert into sequence select num+ 32 from sequence; -- 64 records
insert into sequence select num+ 64 from sequence; -- 128 records
insert into sequence select num+ 128 from sequence; -- 256 records
insert into sequence select num+ 256 from sequence; -- 512 records
insert into sequence select num+ 512 from sequence; -- 1024 records
insert into sequence select num+1024 from sequence; -- 2048 records
insert into sequence select num+2048 from sequence; -- 4096 records
您可以继续这样插入记录,但对于当前任务来说,这已经足够了。
现在进入实际解决方案:
select person_id, count(distinct num), count(num)
from sequence
cross join (select min(start_date) min_date,
max(end_date) max_date
from t) stats
inner join t
on date_add(min_date, interval (num*24+12) hour)
between start_date and end_date
where num < datediff(max_date, min_date)
group by person_id
此查询使用唯一数字来获取从最早开始日期开始的天数,并在这些日期处于句点时包括这些日期。然后计算满足该条件的唯一日期。
where
子句是可选的,但会加快查询速度。
这里是fiddle。它产生这个结果:
| Person_id | count(distinct num) | count(num) |
|-----------|---------------------|------------|
| 83244 | 369 | 386 |