postgreSQL select 区间和填空

Question

我正在开发一个系统来管理不同项目中的问题。

我有以下 tables:

项目

id	Description	Country
1	3D experience	Brazil
2	Lorem Epsum	Chile

问题

id	idProject	Description
1	1	Not loading
2	1	Breaking down

Problems_status

id	idProblem	Status	Start_date	End_date
1	1	Red	2020-10-17	2020-10-25
2	1	Yellow	2020-10-25	2020-11-20
3	1	Red	2020-11-20
4	2	Red	2020-11-01	2020-11-25
5	2	Yellow	2020-11-25	2020-12-22
6	2	Red	2020-12-22	2020-12-23
7	2	Green	2020-12-23

以上例子中，问题1还是红色，问题2是绿色（无结束日期）。

我需要在用户选择特定项目时创建一个图表，其中将显示每周（从第一个注册问题的那一周开始）的问题状态。项目 1 的图表应如下所示：

我正尝试在 postgreSQL 中将代码写入 return 一个 table，这样我就可以填充此图表：

Week	Green	Yellow	Red
42/20	0	0	1
43/20	0	0	1
44/20	0	1	0
...	...	...	...
04/21	1	0	1

我一直在尝试多种方法，但就是不知道该怎么做，有人可以帮我吗？下面是 db-fiddle 的帮助：

CREATE TABLE projects (
  id serial NOT NULL,
  description character varying(50) NOT NULL,
  country character varying(50) NOT NULL,
  CONSTRAINT projects_pkey PRIMARY KEY (id)
);

CREATE TABLE problems (
  id serial NOT NULL,
  id_project integer NOT NULL,
  description character varying(50) NOT NULL,
  CONSTRAINT problems_pkey PRIMARY KEY (id),
  CONSTRAINT problems_id_project_fkey FOREIGN KEY (id_project)
      REFERENCES projects (id) MATCH SIMPLE
);

CREATE TABLE problems_status (
  id serial NOT NULL,
  id_problem integer NOT NULL,
  status character varying(50) NOT NULL,
  start_date date NOT NULL,
  end_date date,
  CONSTRAINT problems_status_pkey PRIMARY KEY (id),
  CONSTRAINT problems_status_id_problem_fkey FOREIGN KEY (id_problem)
      REFERENCES problems (id) MATCH SIMPLE
);

INSERT INTO projects (description, country) VALUES ('3D experience','Brazil');
INSERT INTO projects (description, country) VALUES ('Lorem Epsum','Chile');
INSERT INTO problems (id_project ,description) VALUES (1,'Not loading');
INSERT INTO problems (id_project ,description) VALUES (1,'Breaking down');
INSERT INTO problems_status (id_problem, status, start_date, end_date) VALUES
(1, 'Red', '2020-10-17', '2020-10-25'),(1, 'Yellow', '2020-10-25', '2020-11-20'),
(1, 'Red', '2020-11-20', NULL),(2, 'Red', '2020-11-01', '2020-11-25'),
(2, 'Yellow', '2020-11-25', '2020-12-22'),(2, 'Red', '2020-12-22', '2020-12-23'),
(2, 'Green', '2020-12-23', NULL);

Answer 1

您可以使用 COALESCE 到 select 列表中的第一个非空值来填空。

SELECT COALESCE(<some_value_that_could_be_null>, <some_value_that_will_not_be_null>);

如果您想将时间范围的界限强制放入结果集中，您可以 UNION 具有特定日期的结果集。

SELECT ... -- your data query here
UNION ALL
SELECT end_ts -- WHERE end_ts is a timestamptz type

为了UNION，您需要在联合查询中返回相同数量和相同类型的字段。您可以使用 NULL 转换为任何匹配类型来填写时间戳以外的所有内容。

更具体的例子：

WITH data AS -- get raw data
(
    SELECT p.id
         , ps.status
         , ps.start_date
         , COALESCE(ps.end_date, CURRENT_DATE, '01-01-2025'::DATE) -- you can fill in NULL values with COALESCE
         , pj.country
         , pj.description
         , MAX(start_date) OVER (PARTITION BY p.id) AS latest_update
      FROM problems p
      JOIN projects pj ON (pj.id = p.id_project)
      JOIN problem_status ps ON (p.id = ps.id_problem)
     UNION ALL -- force bounds in the following
    SELECT NULL::INTEGER -- could be null or a defaulted value
         , NULL::TEXT    -- could be null or a defaulted value
         , start_date -- either as an input param to a function or a hard-coded date
         , end_date   -- either as an input param to a function or a hard-coded date
         , NULL::TEXT
         , NULL::TEXT
         , NULL::DATE
) -- aggregate in the following
SELECT <week> -- you'll have to figure out how you're getting weeks out of the DATE data
     , COUNT(*) FILTER (WHERE status = 'Red')
     , COUNT(*) FILTER (WHERE status = 'Yellow')
     , COUNT(*) FILTER (WHERE status = 'Green')
  FROM data
 WHERE start_date = latest_update
 GROUP BY <week> 
;

此查询中使用的某些功能非常强大，如果您不熟悉这些功能并且您将要执行大量报告查询，则应该查阅它们。主要是合并，常见的table表达式（CTE），window函数，聚合表达式。

Aggregate Expressions

WITH Queries (CTEs)

COALESCE

Window Functions

我写了一个 dbfiddle 给你看看 here 在你更新你的要求后。

Answer 2

如果我理解正确的话，您的目标是根据特定项目在特定时间段（从最小数据库日期到当前日期）的问题状态生成每周统计表。此外，如果问题状态跨越一周，则应将其包含在每周统计中。这涉及 2 个时间段，针对状态 start/end 日期的报告期并检查这些日期是否重叠。现在有 5 个重叠场景需要检查；让我们调用范围让 A 报告期间的任何一周和 B. start/end 的状态。现在，允许 A 必须在报告期内结束。但是B没有我们下面的。

A开始，B开始，A结束，B结束。 B 与 A 的末端重叠。
A开始，B开始，B结束，A结束。 B 完全包含在 A 中。
B开始，A开始，B结束，A结束。 B 与 A 的开头重叠。
B开始，A开始，A结束，B结束。 A 完全封闭在 B 中。幸运的是，Postgres 提供了处理上述所有功能的功能，这意味着查询不必处理单独的验证。这是 DATERANGEs and the Overlap operator. The difficult work then becomes defining each week with in A. Then employ the Overlap operator on daterange for each week in A against the daterange for B (start_date, end_date). Then do conditional aggregation. for each overlap detected. See full example here.

with  problem_list( problem_id ) as 
       -- identify the specific problem_ids desirded
       (select ps.id 
          from projects p
          join problems ps on(ps.id_project =  p.id)
         where p.id  = &selected_project
       )  --select * from problem_list;
      
  , report_period(srange, erange) as 
       -- generate the first day of week (Mon) for the
       -- oldest start date through day of week of Current_Date   
       (select min(first_of_week(ps.start_date))  
             , first_of_week(current_date)
          from problem_status ps
          join problem_list pl 
            on (pl.problem_id = ps.id_problem)
       )  --select * from report_period; 
      
  , weekly_calendar(wk,yr, week_dates) as 
       -- expand the start, end date ranges to week dates (Mon-Sun) 
       -- and identify the week number with year
       (select extract( week from mon)::integer wk
             , extract( isoyear from mon)::integer yr
             , daterange(mon, mon+6, '[]'::text) wk_dates
          from (select generate_series(srange,erange, interval '7 days')::date mon
                  from  report_period
               ) d
       )  -- select * from weekly_calendar; 
   , status_by_week(yr,wk,status) as   
     -- determine where problem start_date, end_date overlaps each calendar week
     -- then where multiple statuses exist for any week keep only the lat               
        ( select yr,wk,status  
            from (select  wc.yr,wc.wk,ps.status 
                 --   ,  ps.start_date, wc.week_dates,id_problem 
                      , row_number() over (partition by ps.id_problem,yr,wk order by yr, wk, start_date desc)  rn
                   from problem_status  ps 
                    join problem_list   pl on (pl.problem_id = ps.id_problem)
                    join weekly_calendar wc on (wc.week_dates && daterange(ps.start_date,ps.end_date))  -- actual overlap test  
                 ) ac
           where rn=1
        ) -- select * from status_by_week order by wk;
select 'Project ' || p.id || ': ' || p.description Project
    , to_char(wk,'fm09') || '/' || substr(to_char(yr,'fm0000'),3) "WK"
    , "Red", "Yellow", "Green"
 from projects p
cross join (select sbw.yr,sbw.wk 
                 , count(*) filter (where sbw.status = 'Red')    "Red"
                 , count(*) filter (where sbw.status = 'Yellow') "Yellow"
                 , count(*) filter (where sbw.status = 'Green')  "Green" 
              from status_by_week sbw 
             group by sbw.yr, sbw.wk
           ) sr
where p.id  = &selected_project
order by yr,wk;

CTE 和主要操作如下：

problem_list：确定问题 (id_problem) 相关指定项目。
report_period：标识从开始到结束的完整报告期。
weekly_calendar：生成报告周期内的每一周的开始日期（周一）和结束日期（周日）（上图A） .沿着它也收集一年中的星期和 ISO 年。
status_by_week：这是执行两项任务的真正工作马。首先是通过日历中的每一周的每个问题。它为检测到的每个重叠构建行。然后它强制执行“一个状态”规则。
最后，主要select将状态汇总到合适的存储桶并添加语法糖以获取程序名称。

注意函数 first_of_week()。这是一个用户定义的函数，在示例和下面的示例中可用。我前段时间创建了它并发现它很有用。你可以自由使用它。但是您这样做 没有任何适用性或保证声明。

create or replace
function first_of_week(date_in date)
 returns date
language sql
immutable strict
/*
 * Given a date return the first day of the week according to ISO-8601
 * 
 *    ISO-8601 Standard (in short) 
 *    1 All weeks begin on Monday.
 *    2 All Weeks have exactly 7 days.
 *    3 First week of any year is the Monday on or before 4-Jan.
 *      This implies that the last few days on Dec may be in the 
 *      first week of the following year and that the first few 
 *      days of Jan may be in week 53 (53) of the prior year.
 *      (Not at the same time obviously.)  
 *  
 */ 
as $$
   with wk_adj(l_days) as (values  (array[0,1,2,3,4,5,6]))
   select date_in - l_days[ extract (isodow from date_in)::integer ]
     from wk_adj;
$$;

在示例中，我将查询实现为 SQL 函数，因为 db<>fiddle 似乎与绑定变量有关和替换变量，此外它还提供了参数化的能力。（讨厌硬编码值）。例如我为额外测试添加了额外的数据，主要是不会被 selected 的数据。还有一个额外的状态（如果它遇到这 3 个状态值以外的东西会发生什么（在本例中为粉红色）。这很容易删除，只需摆脱其他。

您注意到“日期范围涵盖周一至周一，而不是周一至周日”是不正确的，尽管对于不习惯查看它们的人来说，这似乎是这样。让我们以第 43 周为例。如果您查询日期范围，它将显示 [2020-10-19,2020-10-26)，是的，这两个日期都是星期一。但是，括号中的字符是有意义的。前导字符 [ 表示要包含日期，结尾字符 ) 表示不包含日期。标准条件：

somedate && [2020-10-19,2020-10-26) 
is the same as
somedate >= 2020-10-19 and somedate < 2020-10-26

这就是为什么当您将增量从“mon+6”更改为“mon+5”时，您修复了第 43 周，但在其他周引入了错误。

postgreSQL select 区间和填空

postgreSQL select interval and fill blanks

postgresql

intervals