如何在同一查询但不同列中将 2 个查询合并在一起?

How to merge 2 queries together in same query but different columns?

我正在使用 PostgreSQL 9.3.9 运行 两个不同的查询,它们给出不同的结果,但都按 "month-year" 分组。我想知道如何创建查询以在 one table?

中为我提供相同的数据

查询 1:

SELECT CONCAT(EXTRACT(MONTH FROM startedPayingDate), '-', 
            EXTRACT(YEAR FROM startedPayingDate)) AS "Month", 
 COUNT(*) AS "Total AB Paying Customers"
FROM (       
 SELECT cm.customer_id, MIN(cm.created_at) AS startedPayingDate 
 FROM customerusermap AS cm, users as u
 WHERE cm.customer_id = u.customer_id AND cm.user_id<>u.id 
 GROUP BY cm.customer_id ) AS t
GROUP BY 1, EXTRACT(MONTH FROM startedPayingDate), EXTRACT(YEAR FROM startedPayingDate)
ORDER BY EXTRACT(YEAR FROM startedPayingDate), EXTRACT(MONTH FROM startedPayingDate);

结果是这样的:

Month  | Total AB Paying Customers
---------------------------------
3-2014 | 2
4-2014 | 4

查询 2:

SELECT concat(extract(MONTH from u.created_at),'-',extract(year from u.created_at)) as "Month", 
count(u.email) as "Total SMB Paying Customers"
FROM customerusermap AS cm, users AS u 
WHERE cm.customer_id = u.customer_id AND cm.user_id = u.id AND u.paid_status = 'paying' 
GROUP by 1,extract(month from u.created_at),extract(year from u.created_at)
order by extract(year from u.created_at),extract(month from u.created_at);

结果是这样的:

Month  | Total SMB Paying Customers
-----------------------------------
2-2014 | 3
3-2014 | 8
4-2014 | 5

想要的结果

我想将这两个查询合并成如图所示的结果并按年和月排序(即从旧到新):

Month  | Total AB Paying Customers | Total SMB Paying Customers | Total | Cumulative
-------------------------------------------------------------------------------------
2-2014 |           0               |            3               |    3  |   3
3-2014 |           2               |            8               |    10 |   13
4-2014 |           4               |            5               |    9  |   22

Table 定义

CREATE TABLE users (
id serial NOT NULL,
firstname character varying(255) NOT NULL,
lastname character varying(255) NOT NULL,
email character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
customer_id character varying(255) DEFAULT NULL::character varying,
companyname character varying(255),
primary_user_id integer,
paid_status character varying(255),  -- updated from comment
CONSTRAINT users_pkey PRIMARY KEY (id),
CONSTRAINT primary_user_id_fk FOREIGN KEY (primary_user_id) REFERENCES users (id),
CONSTRAINT users_uuid_key UNIQUE (uuid)
)

而客户用户映射 table 看起来像:

CREATE TABLE customerusermap (
id serial NOT NULL,
user_id integer NOT NULL,
customer_id character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT customerusermap_pkey PRIMARY KEY (id),
CONSTRAINT customerusermap_user_id_fkey FOREIGN KEY (user_id) REFERENCES users (id),
CONSTRAINT customerusermap_user_id_key UNIQUE (user_id)
);

一般解决方案

关键特性是 FULL OUTER JOIN,但要正确处理 NULL 值:

SELECT *
     , "Total AB Paying Customers" + "Total SMB Paying Customers" AS "Total"
     , sum("Total AB Paying Customers" + "Total SMB Paying Customers")
         OVER (ORDER BY "Month") AS "Cumulative"
FROM  (
   SELECT "Month"
        , COALESCE(q1."Total AB Paying Customers", 0)  AS "Total AB Paying Customers"
        , COALESCE(q2."Total SMB Paying Customers", 0) AS "Total SMB Paying Customers"
   FROM      (<query1>) q1
   FULL JOIN (<query2>) q2 USING ("Month")
   ) sub;

使用sum()作为累计总和的window function
额外的子查询层只是为了方便,所以我们不必经常添加COALESCE()
查询可以进一步简化:在外部格式化月份 SELECT,等等

优化查询

基于您添加的设置:

SELECT to_char(mon, 'FMMM-YYYY') AS "Month"
     , ct_ab                     AS "Total AB Paying Customers"
     , ct_smb                    AS "Total SMB Paying Customers"
     , ct_ab + ct_smb            AS "Total"
     , sum(ct_ab + ct_smb) OVER (ORDER BY mon)::int AS "Cumulative"
FROM  (
   SELECT mon, COALESCE(q1.ct_ab, 0) AS ct_ab, COALESCE(q2.ct_smb, 0) AS ct_smb
   FROM  (
      SELECT date_trunc('month', start_date) AS mon, count(*)::int AS ct_ab
      FROM  (       
         SELECT cm.customer_id, min(cm.created_at) AS start_date 
         FROM   customerusermap cm
         JOIN   users u USING (customer_id)
         WHERE  cm.user_id <> u.id 
         GROUP  BY 1
         ) t
      GROUP  BY 1
      ) q1
   FULL JOIN (
      SELECT date_trunc('month', u.created_at) AS mon, count(*)::int AS ct_smb
      FROM   customerusermap cm
      JOIN   users u USING (customer_id)
      WHERE  cm.user_id = u.id AND u.paid_status = 'paying' 
      GROUP  BY 1
      ) q2 USING (mon)
   ) sub;
ORDER  BY mon;

要点

  • 使用 to_char() to format your month any way you like. And do it just once at the end. The template pattern FMMM 生成不带前导零的月份数字,就像您原来的那样。

  • 使用date_trunc()将您的timestamp without time zone粒度化为月份分辨率(月份的第一个时间戳,但这在这里没有区别)。

  • 我添加了 ORDER BY mon 以获得您评论的排序顺序。这按预期工作,因为列 mon 仍然是 timestamp(尚未转换为字符串 (text)。

  • 由于u.email定义为NOT NULLcount(*)在这个上下文中与count(u.email)的作用相同,更便宜一些。

  • 使用明确的 JOIN 语法。性能相同,但更清晰。

  • 我将汇总计数转换为 integer。这完全是 optional(假设你不会有整数溢出)。所以你在结果中有所有整数而不是 bigintnumeric

与您的原始版本相比,您会发现它更短更快。

如果性能很重要,请确保在相关列上有索引。如果 users 中有 多个 个条目到 customerusermap 中的一个条目,则有更复杂的选项 JOIN LATERAL 可以使您的查询更快:

  • Optimize GROUP BY query to retrieve latest record per user

如果您想将没有任何 activity 的月份包括在完整的月份列表中,LEFT JOIN 将改为包含月份。示例:

  • PostgreSQL: running count of rows for a query 'by minute'