postgreSQL select 区间和填空
postgreSQL select interval and fill blanks
我正在开发一个系统来管理不同项目中的问题。
我有以下 tables:
项目
id
Description
Country
1
3D experience
Brazil
2
Lorem Epsum
Chile
问题
id
idProject
Description
1
1
Not loading
2
1
Breaking down
Problems_status
id
idProblem
Status
Start_date
End_date
1
1
Red
2020-10-17
2020-10-25
2
1
Yellow
2020-10-25
2020-11-20
3
1
Red
2020-11-20
4
2
Red
2020-11-01
2020-11-25
5
2
Yellow
2020-11-25
2020-12-22
6
2
Red
2020-12-22
2020-12-23
7
2
Green
2020-12-23
以上例子中,问题1还是红色,问题2是绿色(无结束日期)。
我需要在用户选择特定项目时创建一个图表,其中将显示每周(从第一个注册问题的那一周开始)的问题状态。项目 1 的图表应如下所示:
我正尝试在 postgreSQL 中将代码写入 return 一个 table,这样我就可以填充此图表:
Week
Green
Yellow
Red
42/20
0
0
1
43/20
0
0
1
44/20
0
1
0
...
...
...
...
04/21
1
0
1
我一直在尝试多种方法,但就是不知道该怎么做,有人可以帮我吗?
下面是 db-fiddle 的帮助:
CREATE TABLE projects (
id serial NOT NULL,
description character varying(50) NOT NULL,
country character varying(50) NOT NULL,
CONSTRAINT projects_pkey PRIMARY KEY (id)
);
CREATE TABLE problems (
id serial NOT NULL,
id_project integer NOT NULL,
description character varying(50) NOT NULL,
CONSTRAINT problems_pkey PRIMARY KEY (id),
CONSTRAINT problems_id_project_fkey FOREIGN KEY (id_project)
REFERENCES projects (id) MATCH SIMPLE
);
CREATE TABLE problems_status (
id serial NOT NULL,
id_problem integer NOT NULL,
status character varying(50) NOT NULL,
start_date date NOT NULL,
end_date date,
CONSTRAINT problems_status_pkey PRIMARY KEY (id),
CONSTRAINT problems_status_id_problem_fkey FOREIGN KEY (id_problem)
REFERENCES problems (id) MATCH SIMPLE
);
INSERT INTO projects (description, country) VALUES ('3D experience','Brazil');
INSERT INTO projects (description, country) VALUES ('Lorem Epsum','Chile');
INSERT INTO problems (id_project ,description) VALUES (1,'Not loading');
INSERT INTO problems (id_project ,description) VALUES (1,'Breaking down');
INSERT INTO problems_status (id_problem, status, start_date, end_date) VALUES
(1, 'Red', '2020-10-17', '2020-10-25'),(1, 'Yellow', '2020-10-25', '2020-11-20'),
(1, 'Red', '2020-11-20', NULL),(2, 'Red', '2020-11-01', '2020-11-25'),
(2, 'Yellow', '2020-11-25', '2020-12-22'),(2, 'Red', '2020-12-22', '2020-12-23'),
(2, 'Green', '2020-12-23', NULL);
您可以使用 COALESCE
到 select 列表中的第一个非空值来填空。
SELECT COALESCE(<some_value_that_could_be_null>, <some_value_that_will_not_be_null>);
如果您想将时间范围的界限强制放入结果集中,您可以 UNION
具有特定日期的结果集。
SELECT ... -- your data query here
UNION ALL
SELECT end_ts -- WHERE end_ts is a timestamptz type
为了UNION
,您需要在联合查询中返回相同数量和相同类型的字段。您可以使用 NULL
转换为任何匹配类型来填写时间戳以外的所有内容。
更具体的例子:
WITH data AS -- get raw data
(
SELECT p.id
, ps.status
, ps.start_date
, COALESCE(ps.end_date, CURRENT_DATE, '01-01-2025'::DATE) -- you can fill in NULL values with COALESCE
, pj.country
, pj.description
, MAX(start_date) OVER (PARTITION BY p.id) AS latest_update
FROM problems p
JOIN projects pj ON (pj.id = p.id_project)
JOIN problem_status ps ON (p.id = ps.id_problem)
UNION ALL -- force bounds in the following
SELECT NULL::INTEGER -- could be null or a defaulted value
, NULL::TEXT -- could be null or a defaulted value
, start_date -- either as an input param to a function or a hard-coded date
, end_date -- either as an input param to a function or a hard-coded date
, NULL::TEXT
, NULL::TEXT
, NULL::DATE
) -- aggregate in the following
SELECT <week> -- you'll have to figure out how you're getting weeks out of the DATE data
, COUNT(*) FILTER (WHERE status = 'Red')
, COUNT(*) FILTER (WHERE status = 'Yellow')
, COUNT(*) FILTER (WHERE status = 'Green')
FROM data
WHERE start_date = latest_update
GROUP BY <week>
;
此查询中使用的某些功能非常强大,如果您不熟悉这些功能并且您将要执行大量报告查询,则应该查阅它们。主要是合并,常见的table表达式(CTE),window函数,聚合表达式。
我写了一个 dbfiddle 给你看看 here 在你更新你的要求后。
如果我理解正确的话,您的目标是根据特定项目在特定时间段(从最小数据库日期到当前日期)的问题状态生成每周统计表。此外,如果问题状态跨越一周,则应将其包含在每周统计中。这涉及 2 个时间段,针对状态 start/end 日期的报告期并检查这些日期是否重叠。现在有 5 个重叠场景需要检查;让我们调用范围让 A 报告期间的任何一周和 B. start/end 的状态。现在,允许 A 必须在报告期内结束。但是B没有我们下面的。
- A开始,B开始,A结束,B结束。 B 与 A 的末端重叠。
- A开始,B开始,B结束,A结束。 B 完全包含在 A 中。
- B开始,A开始,B结束,A结束。 B 与 A 的开头重叠。
- B开始,A开始,A结束,B结束。 A 完全封闭在 B 中。
幸运的是,Postgres 提供了处理上述所有功能的功能,这意味着查询不必处理单独的验证。这是 DATERANGEs and the Overlap operator. The difficult work then becomes defining each week with in A. Then employ the Overlap operator on daterange for each week in A against the daterange for B (start_date, end_date). Then do conditional aggregation. for each overlap detected. See full example here.
with problem_list( problem_id ) as
-- identify the specific problem_ids desirded
(select ps.id
from projects p
join problems ps on(ps.id_project = p.id)
where p.id = &selected_project
) --select * from problem_list;
, report_period(srange, erange) as
-- generate the first day of week (Mon) for the
-- oldest start date through day of week of Current_Date
(select min(first_of_week(ps.start_date))
, first_of_week(current_date)
from problem_status ps
join problem_list pl
on (pl.problem_id = ps.id_problem)
) --select * from report_period;
, weekly_calendar(wk,yr, week_dates) as
-- expand the start, end date ranges to week dates (Mon-Sun)
-- and identify the week number with year
(select extract( week from mon)::integer wk
, extract( isoyear from mon)::integer yr
, daterange(mon, mon+6, '[]'::text) wk_dates
from (select generate_series(srange,erange, interval '7 days')::date mon
from report_period
) d
) -- select * from weekly_calendar;
, status_by_week(yr,wk,status) as
-- determine where problem start_date, end_date overlaps each calendar week
-- then where multiple statuses exist for any week keep only the lat
( select yr,wk,status
from (select wc.yr,wc.wk,ps.status
-- , ps.start_date, wc.week_dates,id_problem
, row_number() over (partition by ps.id_problem,yr,wk order by yr, wk, start_date desc) rn
from problem_status ps
join problem_list pl on (pl.problem_id = ps.id_problem)
join weekly_calendar wc on (wc.week_dates && daterange(ps.start_date,ps.end_date)) -- actual overlap test
) ac
where rn=1
) -- select * from status_by_week order by wk;
select 'Project ' || p.id || ': ' || p.description Project
, to_char(wk,'fm09') || '/' || substr(to_char(yr,'fm0000'),3) "WK"
, "Red", "Yellow", "Green"
from projects p
cross join (select sbw.yr,sbw.wk
, count(*) filter (where sbw.status = 'Red') "Red"
, count(*) filter (where sbw.status = 'Yellow') "Yellow"
, count(*) filter (where sbw.status = 'Green') "Green"
from status_by_week sbw
group by sbw.yr, sbw.wk
) sr
where p.id = &selected_project
order by yr,wk;
CTE 和主要操作如下:
problem_list:确定问题 (id_problem) 相关
指定项目。
report_period:标识从开始到结束的完整报告期。
weekly_calendar:生成报告周期内的每一周的开始日期(周一)和结束日期(周日)(上图A) .沿着
它也收集一年中的星期和 ISO 年。
status_by_week:这是执行两项任务的真正工作马。
首先是通过日历中的每一周的每个问题。它
为检测到的每个重叠构建行。然后它强制执行“一个
状态”规则。
最后,主要select将状态汇总到合适的
存储桶并添加语法糖以获取程序名称。
注意函数 first_of_week()。这是一个用户定义的函数,在示例和下面的示例中可用。我前段时间创建了它并发现它很有用。你可以自由使用它。但是您这样做 没有任何适用性或保证声明。
create or replace
function first_of_week(date_in date)
returns date
language sql
immutable strict
/*
* Given a date return the first day of the week according to ISO-8601
*
* ISO-8601 Standard (in short)
* 1 All weeks begin on Monday.
* 2 All Weeks have exactly 7 days.
* 3 First week of any year is the Monday on or before 4-Jan.
* This implies that the last few days on Dec may be in the
* first week of the following year and that the first few
* days of Jan may be in week 53 (53) of the prior year.
* (Not at the same time obviously.)
*
*/
as $$
with wk_adj(l_days) as (values (array[0,1,2,3,4,5,6]))
select date_in - l_days[ extract (isodow from date_in)::integer ]
from wk_adj;
$$;
在示例中,我将查询实现为 SQL 函数,因为 db<>fiddle 似乎与绑定变量有关
和替换变量,此外它还提供了参数化的能力。 (讨厌硬编码值)。例如我
为额外测试添加了额外的数据,主要是不会被 selected 的数据。还有一个额外的状态(如果它遇到这 3 个状态值以外的东西会发生什么(在本例中为粉红色)。这很容易删除,只需摆脱其他。
您注意到“日期范围涵盖周一至周一,而不是周一至周日”是不正确的,尽管对于不习惯查看它们的人来说,这似乎是这样。让我们以第 43 周为例。如果您查询日期范围,它将显示 [2020-10-19,2020-10-26),是的,这两个日期都是星期一。但是,括号中的字符是有意义的。前导字符 [ 表示要包含日期 ,结尾字符 ) 表示不包含日期 。标准条件:
somedate && [2020-10-19,2020-10-26)
is the same as
somedate >= 2020-10-19 and somedate < 2020-10-26
这就是为什么当您将增量从“mon+6”更改为“mon+5”时,您修复了第 43 周,但在其他周引入了错误。
我正在开发一个系统来管理不同项目中的问题。
我有以下 tables:
项目
id | Description | Country |
---|---|---|
1 | 3D experience | Brazil |
2 | Lorem Epsum | Chile |
问题
id | idProject | Description |
---|---|---|
1 | 1 | Not loading |
2 | 1 | Breaking down |
Problems_status
id | idProblem | Status | Start_date | End_date |
---|---|---|---|---|
1 | 1 | Red | 2020-10-17 | 2020-10-25 |
2 | 1 | Yellow | 2020-10-25 | 2020-11-20 |
3 | 1 | Red | 2020-11-20 | |
4 | 2 | Red | 2020-11-01 | 2020-11-25 |
5 | 2 | Yellow | 2020-11-25 | 2020-12-22 |
6 | 2 | Red | 2020-12-22 | 2020-12-23 |
7 | 2 | Green | 2020-12-23 |
以上例子中,问题1还是红色,问题2是绿色(无结束日期)。
我需要在用户选择特定项目时创建一个图表,其中将显示每周(从第一个注册问题的那一周开始)的问题状态。项目 1 的图表应如下所示:
我正尝试在 postgreSQL 中将代码写入 return 一个 table,这样我就可以填充此图表:
Week | Green | Yellow | Red |
---|---|---|---|
42/20 | 0 | 0 | 1 |
43/20 | 0 | 0 | 1 |
44/20 | 0 | 1 | 0 |
... | ... | ... | ... |
04/21 | 1 | 0 | 1 |
我一直在尝试多种方法,但就是不知道该怎么做,有人可以帮我吗? 下面是 db-fiddle 的帮助:
CREATE TABLE projects (
id serial NOT NULL,
description character varying(50) NOT NULL,
country character varying(50) NOT NULL,
CONSTRAINT projects_pkey PRIMARY KEY (id)
);
CREATE TABLE problems (
id serial NOT NULL,
id_project integer NOT NULL,
description character varying(50) NOT NULL,
CONSTRAINT problems_pkey PRIMARY KEY (id),
CONSTRAINT problems_id_project_fkey FOREIGN KEY (id_project)
REFERENCES projects (id) MATCH SIMPLE
);
CREATE TABLE problems_status (
id serial NOT NULL,
id_problem integer NOT NULL,
status character varying(50) NOT NULL,
start_date date NOT NULL,
end_date date,
CONSTRAINT problems_status_pkey PRIMARY KEY (id),
CONSTRAINT problems_status_id_problem_fkey FOREIGN KEY (id_problem)
REFERENCES problems (id) MATCH SIMPLE
);
INSERT INTO projects (description, country) VALUES ('3D experience','Brazil');
INSERT INTO projects (description, country) VALUES ('Lorem Epsum','Chile');
INSERT INTO problems (id_project ,description) VALUES (1,'Not loading');
INSERT INTO problems (id_project ,description) VALUES (1,'Breaking down');
INSERT INTO problems_status (id_problem, status, start_date, end_date) VALUES
(1, 'Red', '2020-10-17', '2020-10-25'),(1, 'Yellow', '2020-10-25', '2020-11-20'),
(1, 'Red', '2020-11-20', NULL),(2, 'Red', '2020-11-01', '2020-11-25'),
(2, 'Yellow', '2020-11-25', '2020-12-22'),(2, 'Red', '2020-12-22', '2020-12-23'),
(2, 'Green', '2020-12-23', NULL);
您可以使用 COALESCE
到 select 列表中的第一个非空值来填空。
SELECT COALESCE(<some_value_that_could_be_null>, <some_value_that_will_not_be_null>);
如果您想将时间范围的界限强制放入结果集中,您可以 UNION
具有特定日期的结果集。
SELECT ... -- your data query here
UNION ALL
SELECT end_ts -- WHERE end_ts is a timestamptz type
为了UNION
,您需要在联合查询中返回相同数量和相同类型的字段。您可以使用 NULL
转换为任何匹配类型来填写时间戳以外的所有内容。
更具体的例子:
WITH data AS -- get raw data
(
SELECT p.id
, ps.status
, ps.start_date
, COALESCE(ps.end_date, CURRENT_DATE, '01-01-2025'::DATE) -- you can fill in NULL values with COALESCE
, pj.country
, pj.description
, MAX(start_date) OVER (PARTITION BY p.id) AS latest_update
FROM problems p
JOIN projects pj ON (pj.id = p.id_project)
JOIN problem_status ps ON (p.id = ps.id_problem)
UNION ALL -- force bounds in the following
SELECT NULL::INTEGER -- could be null or a defaulted value
, NULL::TEXT -- could be null or a defaulted value
, start_date -- either as an input param to a function or a hard-coded date
, end_date -- either as an input param to a function or a hard-coded date
, NULL::TEXT
, NULL::TEXT
, NULL::DATE
) -- aggregate in the following
SELECT <week> -- you'll have to figure out how you're getting weeks out of the DATE data
, COUNT(*) FILTER (WHERE status = 'Red')
, COUNT(*) FILTER (WHERE status = 'Yellow')
, COUNT(*) FILTER (WHERE status = 'Green')
FROM data
WHERE start_date = latest_update
GROUP BY <week>
;
此查询中使用的某些功能非常强大,如果您不熟悉这些功能并且您将要执行大量报告查询,则应该查阅它们。主要是合并,常见的table表达式(CTE),window函数,聚合表达式。
我写了一个 dbfiddle 给你看看 here 在你更新你的要求后。
如果我理解正确的话,您的目标是根据特定项目在特定时间段(从最小数据库日期到当前日期)的问题状态生成每周统计表。此外,如果问题状态跨越一周,则应将其包含在每周统计中。这涉及 2 个时间段,针对状态 start/end 日期的报告期并检查这些日期是否重叠。现在有 5 个重叠场景需要检查;让我们调用范围让 A 报告期间的任何一周和 B. start/end 的状态。现在,允许 A 必须在报告期内结束。但是B没有我们下面的。
- A开始,B开始,A结束,B结束。 B 与 A 的末端重叠。
- A开始,B开始,B结束,A结束。 B 完全包含在 A 中。
- B开始,A开始,B结束,A结束。 B 与 A 的开头重叠。
- B开始,A开始,A结束,B结束。 A 完全封闭在 B 中。 幸运的是,Postgres 提供了处理上述所有功能的功能,这意味着查询不必处理单独的验证。这是 DATERANGEs and the Overlap operator. The difficult work then becomes defining each week with in A. Then employ the Overlap operator on daterange for each week in A against the daterange for B (start_date, end_date). Then do conditional aggregation. for each overlap detected. See full example here.
with problem_list( problem_id ) as
-- identify the specific problem_ids desirded
(select ps.id
from projects p
join problems ps on(ps.id_project = p.id)
where p.id = &selected_project
) --select * from problem_list;
, report_period(srange, erange) as
-- generate the first day of week (Mon) for the
-- oldest start date through day of week of Current_Date
(select min(first_of_week(ps.start_date))
, first_of_week(current_date)
from problem_status ps
join problem_list pl
on (pl.problem_id = ps.id_problem)
) --select * from report_period;
, weekly_calendar(wk,yr, week_dates) as
-- expand the start, end date ranges to week dates (Mon-Sun)
-- and identify the week number with year
(select extract( week from mon)::integer wk
, extract( isoyear from mon)::integer yr
, daterange(mon, mon+6, '[]'::text) wk_dates
from (select generate_series(srange,erange, interval '7 days')::date mon
from report_period
) d
) -- select * from weekly_calendar;
, status_by_week(yr,wk,status) as
-- determine where problem start_date, end_date overlaps each calendar week
-- then where multiple statuses exist for any week keep only the lat
( select yr,wk,status
from (select wc.yr,wc.wk,ps.status
-- , ps.start_date, wc.week_dates,id_problem
, row_number() over (partition by ps.id_problem,yr,wk order by yr, wk, start_date desc) rn
from problem_status ps
join problem_list pl on (pl.problem_id = ps.id_problem)
join weekly_calendar wc on (wc.week_dates && daterange(ps.start_date,ps.end_date)) -- actual overlap test
) ac
where rn=1
) -- select * from status_by_week order by wk;
select 'Project ' || p.id || ': ' || p.description Project
, to_char(wk,'fm09') || '/' || substr(to_char(yr,'fm0000'),3) "WK"
, "Red", "Yellow", "Green"
from projects p
cross join (select sbw.yr,sbw.wk
, count(*) filter (where sbw.status = 'Red') "Red"
, count(*) filter (where sbw.status = 'Yellow') "Yellow"
, count(*) filter (where sbw.status = 'Green') "Green"
from status_by_week sbw
group by sbw.yr, sbw.wk
) sr
where p.id = &selected_project
order by yr,wk;
CTE 和主要操作如下:
problem_list:确定问题 (id_problem) 相关 指定项目。
report_period:标识从开始到结束的完整报告期。
weekly_calendar:生成报告周期内的每一周的开始日期(周一)和结束日期(周日)(上图A) .沿着 它也收集一年中的星期和 ISO 年。
status_by_week:这是执行两项任务的真正工作马。 首先是通过日历中的每一周的每个问题。它 为检测到的每个重叠构建行。然后它强制执行“一个 状态”规则。
最后,主要select将状态汇总到合适的 存储桶并添加语法糖以获取程序名称。
注意函数 first_of_week()。这是一个用户定义的函数,在示例和下面的示例中可用。我前段时间创建了它并发现它很有用。你可以自由使用它。但是您这样做 没有任何适用性或保证声明。
create or replace
function first_of_week(date_in date)
returns date
language sql
immutable strict
/*
* Given a date return the first day of the week according to ISO-8601
*
* ISO-8601 Standard (in short)
* 1 All weeks begin on Monday.
* 2 All Weeks have exactly 7 days.
* 3 First week of any year is the Monday on or before 4-Jan.
* This implies that the last few days on Dec may be in the
* first week of the following year and that the first few
* days of Jan may be in week 53 (53) of the prior year.
* (Not at the same time obviously.)
*
*/
as $$
with wk_adj(l_days) as (values (array[0,1,2,3,4,5,6]))
select date_in - l_days[ extract (isodow from date_in)::integer ]
from wk_adj;
$$;
在示例中,我将查询实现为 SQL 函数,因为 db<>fiddle 似乎与绑定变量有关 和替换变量,此外它还提供了参数化的能力。 (讨厌硬编码值)。例如我 为额外测试添加了额外的数据,主要是不会被 selected 的数据。还有一个额外的状态(如果它遇到这 3 个状态值以外的东西会发生什么(在本例中为粉红色)。这很容易删除,只需摆脱其他。
您注意到“日期范围涵盖周一至周一,而不是周一至周日”是不正确的,尽管对于不习惯查看它们的人来说,这似乎是这样。让我们以第 43 周为例。如果您查询日期范围,它将显示 [2020-10-19,2020-10-26),是的,这两个日期都是星期一。但是,括号中的字符是有意义的。前导字符 [ 表示要包含日期 ,结尾字符 ) 表示不包含日期 。标准条件:
somedate && [2020-10-19,2020-10-26)
is the same as
somedate >= 2020-10-19 and somedate < 2020-10-26
这就是为什么当您将增量从“mon+6”更改为“mon+5”时,您修复了第 43 周,但在其他周引入了错误。