Postgres 查找两列数据集之间的缺失日期
Postgres Find missing dates between a dataset of two columns
我正在尝试创建一个返回两列和多行之间缺失日期的查询。
示例:
租约
move_in move_out hotel_id
2021-04-01 2021-04-14 1
2021-04-17 2021-04-30 1
2021-04-01 2021-04-14 2
2021-04-17 2021-04-30 2
结果应该是
date hotel_id
2021-04-15 1
2021-04-16 1
2021-04-15 2
2021-04-16 2
如果您使用的是 postgresql 14+,则可以使用 multirange
s 来执行此操作:
CREATE TEMP TABLE t (
"move_in" DATE,
"move_out" DATE,
"hotel_id" INTEGER
);
INSERT INTO t
("move_in", "move_out", "hotel_id")
VALUES ('2021-04-01', '2021-04-14', '1')
, ('2021-04-17', '2021-04-30', '1')
, ('2021-05-03', '2021-05-30', '1') -- added this as a test case
, ('2021-04-01', '2021-04-14', '2')
, ('2021-04-17', '2021-04-30', '2');
SELECT hotel_id, datemultirange(DATERANGE(MIN(move_in), MAX(move_out))) - range_agg(DATERANGE(move_in, move_out, '[]')) AS r
FROM t
GROUP BY hotel_id
returns
+--------+-------------------------------------------------+
|hotel_id|r |
+--------+-------------------------------------------------+
|2 |{[2021-04-14,2021-04-17)} |
|1 |{[2021-04-14,2021-04-17),[2021-04-30,2021-05-03)}|
+--------+-------------------------------------------------+
如果你想每天有 1 行,你可以使用 unnest
和 generate_series
来扩展 multirange
s:
WITH available_ranges AS(
SELECT hotel_id, unnest(datemultirange(DATERANGE(MIN(move_in), MAX(move_out), '[]')) - range_agg(DATERANGE(move_in, move_out, '[]'))) AS r
FROM t
GROUP BY hotel_id
)
SELECT hotel_id, generate_series(lower(r), upper(r) - 1, '1 day'::interval)
FROM available_ranges
ORDER BY 1, 2
;
returns
+--------+---------------------------------+
|hotel_id|generate_series |
+--------+---------------------------------+
|1 |2021-04-15 00:00:00.000000 +00:00|
|1 |2021-04-16 00:00:00.000000 +00:00|
|1 |2021-05-01 00:00:00.000000 +00:00|
|1 |2021-05-02 00:00:00.000000 +00:00|
|2 |2021-04-15 00:00:00.000000 +00:00|
|2 |2021-04-16 00:00:00.000000 +00:00|
+--------+---------------------------------+
您正在找出两组之间的差异。一种是租赁酒店的日子。另一个是四月份的所有日子。你正在为所有酒店这样做。
我们可以为所有酒店制作一组四月的所有日子。首先我们需要构建四月份所有日期的集合:generate_series('2022-04-01'::date, '2022-04-30'::date, '1 day')
.
然后我们需要将它与所有酒店 ID 交叉连接。
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
现在,对于每一天,我们都可以加入当天的租约。
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
没有租约的任何天数都不会 lease.id
,因此请对此进行过滤。
select day, hotels.id
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
left join leases on day between move_in and leases.move_out and hotel_id = hotels.id
where leases.id is null
order by hotels.id, day
我正在尝试创建一个返回两列和多行之间缺失日期的查询。
示例: 租约
move_in move_out hotel_id
2021-04-01 2021-04-14 1
2021-04-17 2021-04-30 1
2021-04-01 2021-04-14 2
2021-04-17 2021-04-30 2
结果应该是
date hotel_id
2021-04-15 1
2021-04-16 1
2021-04-15 2
2021-04-16 2
如果您使用的是 postgresql 14+,则可以使用 multirange
s 来执行此操作:
CREATE TEMP TABLE t (
"move_in" DATE,
"move_out" DATE,
"hotel_id" INTEGER
);
INSERT INTO t
("move_in", "move_out", "hotel_id")
VALUES ('2021-04-01', '2021-04-14', '1')
, ('2021-04-17', '2021-04-30', '1')
, ('2021-05-03', '2021-05-30', '1') -- added this as a test case
, ('2021-04-01', '2021-04-14', '2')
, ('2021-04-17', '2021-04-30', '2');
SELECT hotel_id, datemultirange(DATERANGE(MIN(move_in), MAX(move_out))) - range_agg(DATERANGE(move_in, move_out, '[]')) AS r
FROM t
GROUP BY hotel_id
returns
+--------+-------------------------------------------------+
|hotel_id|r |
+--------+-------------------------------------------------+
|2 |{[2021-04-14,2021-04-17)} |
|1 |{[2021-04-14,2021-04-17),[2021-04-30,2021-05-03)}|
+--------+-------------------------------------------------+
如果你想每天有 1 行,你可以使用 unnest
和 generate_series
来扩展 multirange
s:
WITH available_ranges AS(
SELECT hotel_id, unnest(datemultirange(DATERANGE(MIN(move_in), MAX(move_out), '[]')) - range_agg(DATERANGE(move_in, move_out, '[]'))) AS r
FROM t
GROUP BY hotel_id
)
SELECT hotel_id, generate_series(lower(r), upper(r) - 1, '1 day'::interval)
FROM available_ranges
ORDER BY 1, 2
;
returns
+--------+---------------------------------+
|hotel_id|generate_series |
+--------+---------------------------------+
|1 |2021-04-15 00:00:00.000000 +00:00|
|1 |2021-04-16 00:00:00.000000 +00:00|
|1 |2021-05-01 00:00:00.000000 +00:00|
|1 |2021-05-02 00:00:00.000000 +00:00|
|2 |2021-04-15 00:00:00.000000 +00:00|
|2 |2021-04-16 00:00:00.000000 +00:00|
+--------+---------------------------------+
您正在找出两组之间的差异。一种是租赁酒店的日子。另一个是四月份的所有日子。你正在为所有酒店这样做。
我们可以为所有酒店制作一组四月的所有日子。首先我们需要构建四月份所有日期的集合:generate_series('2022-04-01'::date, '2022-04-30'::date, '1 day')
.
然后我们需要将它与所有酒店 ID 交叉连接。
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
现在,对于每一天,我们都可以加入当天的租约。
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
没有租约的任何天数都不会 lease.id
,因此请对此进行过滤。
select day, hotels.id
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
left join leases on day between move_in and leases.move_out and hotel_id = hotels.id
where leases.id is null
order by hotels.id, day