postgreSQL select 区间和填空

postgreSQL select interval and fill blanks

我正在开发一个系统来管理不同项目中的问题。

我有以下 tables:

项目

id Description Country
1 3D experience Brazil
2 Lorem Epsum Chile

问题

id idProject Description
1 1 Not loading
2 1 Breaking down

Problems_status

id idProblem Status Start_date End_date
1 1 Red 2020-10-17 2020-10-25
2 1 Yellow 2020-10-25 2020-11-20
3 1 Red 2020-11-20
4 2 Red 2020-11-01 2020-11-25
5 2 Yellow 2020-11-25 2020-12-22
6 2 Red 2020-12-22 2020-12-23
7 2 Green 2020-12-23

以上例子中,问题1还是红色,问题2是绿色(无结束日期)。

我需要在用户选择特定项目时创建一个图表,其中将显示每周(从第一个注册问题的那一周开始)的问题状态。项目 1 的图表应如下所示:

我正尝试在 postgreSQL 中将代码写入 return 一个 table,这样我就可以填充此图表:

Week Green Yellow Red
42/20 0 0 1
43/20 0 0 1
44/20 0 1 0
... ... ... ...
04/21 1 0 1

我一直在尝试多种方法,但就是不知道该怎么做,有人可以帮我吗? 下面是 db-fiddle 的帮助:

CREATE TABLE projects (
  id serial NOT NULL,
  description character varying(50) NOT NULL,
  country character varying(50) NOT NULL,
  CONSTRAINT projects_pkey PRIMARY KEY (id)
);

CREATE TABLE problems (
  id serial NOT NULL,
  id_project integer NOT NULL,
  description character varying(50) NOT NULL,
  CONSTRAINT problems_pkey PRIMARY KEY (id),
  CONSTRAINT problems_id_project_fkey FOREIGN KEY (id_project)
      REFERENCES projects (id) MATCH SIMPLE
);

CREATE TABLE problems_status (
  id serial NOT NULL,
  id_problem integer NOT NULL,
  status character varying(50) NOT NULL,
  start_date date NOT NULL,
  end_date date,
  CONSTRAINT problems_status_pkey PRIMARY KEY (id),
  CONSTRAINT problems_status_id_problem_fkey FOREIGN KEY (id_problem)
      REFERENCES problems (id) MATCH SIMPLE
);

INSERT INTO projects (description, country) VALUES ('3D experience','Brazil');
INSERT INTO projects (description, country) VALUES ('Lorem Epsum','Chile');
INSERT INTO problems (id_project ,description) VALUES (1,'Not loading');
INSERT INTO problems (id_project ,description) VALUES (1,'Breaking down');
INSERT INTO problems_status (id_problem, status, start_date, end_date) VALUES
(1, 'Red', '2020-10-17', '2020-10-25'),(1, 'Yellow', '2020-10-25', '2020-11-20'),
(1, 'Red', '2020-11-20', NULL),(2, 'Red', '2020-11-01', '2020-11-25'),
(2, 'Yellow', '2020-11-25', '2020-12-22'),(2, 'Red', '2020-12-22', '2020-12-23'),
(2, 'Green', '2020-12-23', NULL);

您可以使用 COALESCE 到 select 列表中的第一个非空值来填空。

SELECT COALESCE(<some_value_that_could_be_null>, <some_value_that_will_not_be_null>);

如果您想将时间范围的界限强制放入结果集中,您可以 UNION 具有特定日期的结果集。

SELECT ... -- your data query here
UNION ALL
SELECT end_ts -- WHERE end_ts is a timestamptz type

为了UNION,您需要在联合查询中返回相同数量和相同类型的字段。您可以使用 NULL 转换为任何匹配类型来填写时间戳以外的所有内容。

更具体的例子:

WITH data AS -- get raw data
(
    SELECT p.id
         , ps.status
         , ps.start_date
         , COALESCE(ps.end_date, CURRENT_DATE, '01-01-2025'::DATE) -- you can fill in NULL values with COALESCE
         , pj.country
         , pj.description
         , MAX(start_date) OVER (PARTITION BY p.id) AS latest_update
      FROM problems p
      JOIN projects pj ON (pj.id = p.id_project)
      JOIN problem_status ps ON (p.id = ps.id_problem)
     UNION ALL -- force bounds in the following
    SELECT NULL::INTEGER -- could be null or a defaulted value
         , NULL::TEXT    -- could be null or a defaulted value
         , start_date -- either as an input param to a function or a hard-coded date
         , end_date   -- either as an input param to a function or a hard-coded date
         , NULL::TEXT
         , NULL::TEXT
         , NULL::DATE
) -- aggregate in the following
SELECT <week> -- you'll have to figure out how you're getting weeks out of the DATE data
     , COUNT(*) FILTER (WHERE status = 'Red')
     , COUNT(*) FILTER (WHERE status = 'Yellow')
     , COUNT(*) FILTER (WHERE status = 'Green')
  FROM data
 WHERE start_date = latest_update
 GROUP BY <week> 
;

此查询中使用的某些功能非常强大,如果您不熟悉这些功能并且您将要执行大量报告查询,则应该查阅它们。主要是合并,常见的table表达式(CTE),window函数,聚合表达式。

Aggregate Expressions

WITH Queries (CTEs)

COALESCE

Window Functions


我写了一个 dbfiddle 给你看看 here 在你更新你的要求后。

如果我理解正确的话,您的目标是根据特定项目在特定时间段(从最小数据库日期到当前日期)的问题状态生成每周统计表。此外,如果问题状态跨越一周,则应将其包含在每周统计中。这涉及 2 个时间段,针对状态 start/end 日期的报告期并检查这些日期是否重叠。现在有 5 个重叠场景需要检查;让我们调用范围让 A 报告期间的任何一周和 B. start/end 的状态。现在,允许 A 必须在报告期内结束。但是B没有我们下面的。

  • A开始,B开始,A结束,B结束。 B 与 A 的末端重叠。
  • A开始,B开始,B结束,A结束。 B 完全包含在 A 中。
  • B开始,A开始,B结束,A结束。 B 与 A 的开头重叠。
  • B开始,A开始,A结束,B结束。 A 完全封闭在 B 中。 幸运的是,Postgres 提供了处理上述所有功能的功能,这意味着查询不必处理单独的验证。这是 DATERANGEs and the Overlap operator. The difficult work then becomes defining each week with in A. Then employ the Overlap operator on daterange for each week in A against the daterange for B (start_date, end_date). Then do conditional aggregation. for each overlap detected. See full example here.
with  problem_list( problem_id ) as 
       -- identify the specific problem_ids desirded
       (select ps.id 
          from projects p
          join problems ps on(ps.id_project =  p.id)
         where p.id  = &selected_project
       )  --select * from problem_list;
      
  , report_period(srange, erange) as 
       -- generate the first day of week (Mon) for the
       -- oldest start date through day of week of Current_Date   
       (select min(first_of_week(ps.start_date))  
             , first_of_week(current_date)
          from problem_status ps
          join problem_list pl 
            on (pl.problem_id = ps.id_problem)
       )  --select * from report_period; 
      
  , weekly_calendar(wk,yr, week_dates) as 
       -- expand the start, end date ranges to week dates (Mon-Sun) 
       -- and identify the week number with year
       (select extract( week from mon)::integer wk
             , extract( isoyear from mon)::integer yr
             , daterange(mon, mon+6, '[]'::text) wk_dates
          from (select generate_series(srange,erange, interval '7 days')::date mon
                  from  report_period
               ) d
       )  -- select * from weekly_calendar; 
   , status_by_week(yr,wk,status) as   
     -- determine where problem start_date, end_date overlaps each calendar week
     -- then where multiple statuses exist for any week keep only the lat               
        ( select yr,wk,status  
            from (select  wc.yr,wc.wk,ps.status 
                 --   ,  ps.start_date, wc.week_dates,id_problem 
                      , row_number() over (partition by ps.id_problem,yr,wk order by yr, wk, start_date desc)  rn
                   from problem_status  ps 
                    join problem_list   pl on (pl.problem_id = ps.id_problem)
                    join weekly_calendar wc on (wc.week_dates && daterange(ps.start_date,ps.end_date))  -- actual overlap test  
                 ) ac
           where rn=1
        ) -- select * from status_by_week order by wk;
select 'Project ' || p.id || ': ' || p.description Project
    , to_char(wk,'fm09') || '/' || substr(to_char(yr,'fm0000'),3) "WK"
    , "Red", "Yellow", "Green"
 from projects p
cross join (select sbw.yr,sbw.wk 
                 , count(*) filter (where sbw.status = 'Red')    "Red"
                 , count(*) filter (where sbw.status = 'Yellow') "Yellow"
                 , count(*) filter (where sbw.status = 'Green')  "Green" 
              from status_by_week sbw 
             group by sbw.yr, sbw.wk
           ) sr
where p.id  = &selected_project
order by yr,wk;

CTE 和主要操作如下:

  • problem_list:确定问题 (id_problem) 相关 指定项目。

  • report_period:标识从开始到结束的完整报告期。

  • weekly_calendar:生成报告周期内的每一周的开始日期(周一)和结束日期(周日)(上图A) .沿着 它也收集一年中的星期和 ISO 年。

  • status_by_week:这是执行两项任务的真正工作马。 首先是通过日历中的每一周的每个问题。它 为检测到的每个重叠构建行。然后它强制执行“一个 状态”规则。

  • 最后,主要select将状态汇总到合适的 存储桶并添加语法糖以获取程序名称。

注意函数 first_of_week()。这是一个用户定义的函数,在示例和下面的示例中可用。我前段时间创建了它并发现它很有用。你可以自由使用它。但是您这样做 没有任何适用性或保证声明

create or replace
function first_of_week(date_in date)
 returns date
language sql
immutable strict
/*
 * Given a date return the first day of the week according to ISO-8601
 * 
 *    ISO-8601 Standard (in short) 
 *    1 All weeks begin on Monday.
 *    2 All Weeks have exactly 7 days.
 *    3 First week of any year is the Monday on or before 4-Jan.
 *      This implies that the last few days on Dec may be in the 
 *      first week of the following year and that the first few 
 *      days of Jan may be in week 53 (53) of the prior year.
 *      (Not at the same time obviously.)  
 *  
 */ 
as $$
   with wk_adj(l_days) as (values  (array[0,1,2,3,4,5,6]))
   select date_in - l_days[ extract (isodow from date_in)::integer ]
     from wk_adj;
$$;

在示例中,我将查询实现为 SQL 函数,因为 db<>fiddle 似乎与绑定变量有关 和替换变量,此外它还提供了参数化的能力。 (讨厌硬编码值)。例如我 为额外测试添加了额外的数据,主要是不会被 selected 的数据。还有一个额外的状态(如果它遇到这 3 个状态值以外的东西会发生什么(在本例中为粉红色)。这很容易删除,只需摆脱其他。


您注意到“日期范围涵盖周一至周一,而不是周一至周日”是不正确的,尽管对于不习惯查看它们的人来说,这似乎是这样。让我们以第 43 周为例。如果您查询日期范围,它将显示 [2020-10-19,2020-10-26),是的,这两个日期都是星期一。但是,括号中的字符是有意义的。前导字符 [ 表示要包含日期 ,结尾字符 ) 表示不包含日期 。标准条件:

somedate && [2020-10-19,2020-10-26) 
is the same as
somedate >= 2020-10-19 and somedate < 2020-10-26 

这就是为什么当您将增量从“mon+6”更改为“mon+5”时,您修复了第 43 周,但在其他周引入了错误。