如何在同一查询但不同列中将 2 个查询合并在一起?
How to merge 2 queries together in same query but different columns?
我正在使用 PostgreSQL 9.3.9 运行 两个不同的查询,它们给出不同的结果,但都按 "month-year" 分组。我想知道如何创建查询以在 one table?
中为我提供相同的数据
查询 1:
SELECT CONCAT(EXTRACT(MONTH FROM startedPayingDate), '-',
EXTRACT(YEAR FROM startedPayingDate)) AS "Month",
COUNT(*) AS "Total AB Paying Customers"
FROM (
SELECT cm.customer_id, MIN(cm.created_at) AS startedPayingDate
FROM customerusermap AS cm, users as u
WHERE cm.customer_id = u.customer_id AND cm.user_id<>u.id
GROUP BY cm.customer_id ) AS t
GROUP BY 1, EXTRACT(MONTH FROM startedPayingDate), EXTRACT(YEAR FROM startedPayingDate)
ORDER BY EXTRACT(YEAR FROM startedPayingDate), EXTRACT(MONTH FROM startedPayingDate);
结果是这样的:
Month | Total AB Paying Customers
---------------------------------
3-2014 | 2
4-2014 | 4
查询 2:
SELECT concat(extract(MONTH from u.created_at),'-',extract(year from u.created_at)) as "Month",
count(u.email) as "Total SMB Paying Customers"
FROM customerusermap AS cm, users AS u
WHERE cm.customer_id = u.customer_id AND cm.user_id = u.id AND u.paid_status = 'paying'
GROUP by 1,extract(month from u.created_at),extract(year from u.created_at)
order by extract(year from u.created_at),extract(month from u.created_at);
结果是这样的:
Month | Total SMB Paying Customers
-----------------------------------
2-2014 | 3
3-2014 | 8
4-2014 | 5
想要的结果
我想将这两个查询合并成如图所示的结果并按年和月排序(即从旧到新):
Month | Total AB Paying Customers | Total SMB Paying Customers | Total | Cumulative
-------------------------------------------------------------------------------------
2-2014 | 0 | 3 | 3 | 3
3-2014 | 2 | 8 | 10 | 13
4-2014 | 4 | 5 | 9 | 22
Table 定义
CREATE TABLE users (
id serial NOT NULL,
firstname character varying(255) NOT NULL,
lastname character varying(255) NOT NULL,
email character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
customer_id character varying(255) DEFAULT NULL::character varying,
companyname character varying(255),
primary_user_id integer,
paid_status character varying(255), -- updated from comment
CONSTRAINT users_pkey PRIMARY KEY (id),
CONSTRAINT primary_user_id_fk FOREIGN KEY (primary_user_id) REFERENCES users (id),
CONSTRAINT users_uuid_key UNIQUE (uuid)
)
而客户用户映射 table 看起来像:
CREATE TABLE customerusermap (
id serial NOT NULL,
user_id integer NOT NULL,
customer_id character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT customerusermap_pkey PRIMARY KEY (id),
CONSTRAINT customerusermap_user_id_fkey FOREIGN KEY (user_id) REFERENCES users (id),
CONSTRAINT customerusermap_user_id_key UNIQUE (user_id)
);
一般解决方案
关键特性是 FULL OUTER JOIN
,但要正确处理 NULL 值:
SELECT *
, "Total AB Paying Customers" + "Total SMB Paying Customers" AS "Total"
, sum("Total AB Paying Customers" + "Total SMB Paying Customers")
OVER (ORDER BY "Month") AS "Cumulative"
FROM (
SELECT "Month"
, COALESCE(q1."Total AB Paying Customers", 0) AS "Total AB Paying Customers"
, COALESCE(q2."Total SMB Paying Customers", 0) AS "Total SMB Paying Customers"
FROM (<query1>) q1
FULL JOIN (<query2>) q2 USING ("Month")
) sub;
使用sum()
作为累计总和的window function。
额外的子查询层只是为了方便,所以我们不必经常添加COALESCE()
。
查询可以进一步简化:在外部格式化月份 SELECT
,等等
优化查询
基于您添加的设置:
SELECT to_char(mon, 'FMMM-YYYY') AS "Month"
, ct_ab AS "Total AB Paying Customers"
, ct_smb AS "Total SMB Paying Customers"
, ct_ab + ct_smb AS "Total"
, sum(ct_ab + ct_smb) OVER (ORDER BY mon)::int AS "Cumulative"
FROM (
SELECT mon, COALESCE(q1.ct_ab, 0) AS ct_ab, COALESCE(q2.ct_smb, 0) AS ct_smb
FROM (
SELECT date_trunc('month', start_date) AS mon, count(*)::int AS ct_ab
FROM (
SELECT cm.customer_id, min(cm.created_at) AS start_date
FROM customerusermap cm
JOIN users u USING (customer_id)
WHERE cm.user_id <> u.id
GROUP BY 1
) t
GROUP BY 1
) q1
FULL JOIN (
SELECT date_trunc('month', u.created_at) AS mon, count(*)::int AS ct_smb
FROM customerusermap cm
JOIN users u USING (customer_id)
WHERE cm.user_id = u.id AND u.paid_status = 'paying'
GROUP BY 1
) q2 USING (mon)
) sub;
ORDER BY mon;
要点
使用 to_char()
to format your month any way you like. And do it just once at the end. The template pattern FMMM
生成不带前导零的月份数字,就像您原来的那样。
使用date_trunc()
将您的timestamp without time zone
粒度化为月份分辨率(月份的第一个时间戳,但这在这里没有区别)。
我添加了 ORDER BY mon
以获得您评论的排序顺序。这按预期工作,因为列 mon
仍然是 timestamp
(尚未转换为字符串 (text
)。
由于u.email
定义为NOT NULL
,count(*)
在这个上下文中与count(u.email)
的作用相同,更便宜一些。
使用明确的 JOIN
语法。性能相同,但更清晰。
我将汇总计数转换为 integer
。这完全是 optional(假设你不会有整数溢出)。所以你在结果中有所有整数而不是 bigint
和 numeric
与您的原始版本相比,您会发现它更短更快。
如果性能很重要,请确保在相关列上有索引。如果 users
中有 多个 个条目到 customerusermap
中的一个条目,则有更复杂的选项 JOIN LATERAL
可以使您的查询更快:
- Optimize GROUP BY query to retrieve latest record per user
如果您想将没有任何 activity 的月份包括在完整的月份列表中,LEFT JOIN
将改为包含月份。示例:
- PostgreSQL: running count of rows for a query 'by minute'
我正在使用 PostgreSQL 9.3.9 运行 两个不同的查询,它们给出不同的结果,但都按 "month-year" 分组。我想知道如何创建查询以在 one table?
中为我提供相同的数据查询 1:
SELECT CONCAT(EXTRACT(MONTH FROM startedPayingDate), '-',
EXTRACT(YEAR FROM startedPayingDate)) AS "Month",
COUNT(*) AS "Total AB Paying Customers"
FROM (
SELECT cm.customer_id, MIN(cm.created_at) AS startedPayingDate
FROM customerusermap AS cm, users as u
WHERE cm.customer_id = u.customer_id AND cm.user_id<>u.id
GROUP BY cm.customer_id ) AS t
GROUP BY 1, EXTRACT(MONTH FROM startedPayingDate), EXTRACT(YEAR FROM startedPayingDate)
ORDER BY EXTRACT(YEAR FROM startedPayingDate), EXTRACT(MONTH FROM startedPayingDate);
结果是这样的:
Month | Total AB Paying Customers
---------------------------------
3-2014 | 2
4-2014 | 4
查询 2:
SELECT concat(extract(MONTH from u.created_at),'-',extract(year from u.created_at)) as "Month",
count(u.email) as "Total SMB Paying Customers"
FROM customerusermap AS cm, users AS u
WHERE cm.customer_id = u.customer_id AND cm.user_id = u.id AND u.paid_status = 'paying'
GROUP by 1,extract(month from u.created_at),extract(year from u.created_at)
order by extract(year from u.created_at),extract(month from u.created_at);
结果是这样的:
Month | Total SMB Paying Customers
-----------------------------------
2-2014 | 3
3-2014 | 8
4-2014 | 5
想要的结果
我想将这两个查询合并成如图所示的结果并按年和月排序(即从旧到新):
Month | Total AB Paying Customers | Total SMB Paying Customers | Total | Cumulative
-------------------------------------------------------------------------------------
2-2014 | 0 | 3 | 3 | 3
3-2014 | 2 | 8 | 10 | 13
4-2014 | 4 | 5 | 9 | 22
Table 定义
CREATE TABLE users (
id serial NOT NULL,
firstname character varying(255) NOT NULL,
lastname character varying(255) NOT NULL,
email character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
customer_id character varying(255) DEFAULT NULL::character varying,
companyname character varying(255),
primary_user_id integer,
paid_status character varying(255), -- updated from comment
CONSTRAINT users_pkey PRIMARY KEY (id),
CONSTRAINT primary_user_id_fk FOREIGN KEY (primary_user_id) REFERENCES users (id),
CONSTRAINT users_uuid_key UNIQUE (uuid)
)
而客户用户映射 table 看起来像:
CREATE TABLE customerusermap (
id serial NOT NULL,
user_id integer NOT NULL,
customer_id character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT customerusermap_pkey PRIMARY KEY (id),
CONSTRAINT customerusermap_user_id_fkey FOREIGN KEY (user_id) REFERENCES users (id),
CONSTRAINT customerusermap_user_id_key UNIQUE (user_id)
);
一般解决方案
关键特性是 FULL OUTER JOIN
,但要正确处理 NULL 值:
SELECT *
, "Total AB Paying Customers" + "Total SMB Paying Customers" AS "Total"
, sum("Total AB Paying Customers" + "Total SMB Paying Customers")
OVER (ORDER BY "Month") AS "Cumulative"
FROM (
SELECT "Month"
, COALESCE(q1."Total AB Paying Customers", 0) AS "Total AB Paying Customers"
, COALESCE(q2."Total SMB Paying Customers", 0) AS "Total SMB Paying Customers"
FROM (<query1>) q1
FULL JOIN (<query2>) q2 USING ("Month")
) sub;
使用sum()
作为累计总和的window function。
额外的子查询层只是为了方便,所以我们不必经常添加COALESCE()
。
查询可以进一步简化:在外部格式化月份 SELECT
,等等
优化查询
基于您添加的设置:
SELECT to_char(mon, 'FMMM-YYYY') AS "Month"
, ct_ab AS "Total AB Paying Customers"
, ct_smb AS "Total SMB Paying Customers"
, ct_ab + ct_smb AS "Total"
, sum(ct_ab + ct_smb) OVER (ORDER BY mon)::int AS "Cumulative"
FROM (
SELECT mon, COALESCE(q1.ct_ab, 0) AS ct_ab, COALESCE(q2.ct_smb, 0) AS ct_smb
FROM (
SELECT date_trunc('month', start_date) AS mon, count(*)::int AS ct_ab
FROM (
SELECT cm.customer_id, min(cm.created_at) AS start_date
FROM customerusermap cm
JOIN users u USING (customer_id)
WHERE cm.user_id <> u.id
GROUP BY 1
) t
GROUP BY 1
) q1
FULL JOIN (
SELECT date_trunc('month', u.created_at) AS mon, count(*)::int AS ct_smb
FROM customerusermap cm
JOIN users u USING (customer_id)
WHERE cm.user_id = u.id AND u.paid_status = 'paying'
GROUP BY 1
) q2 USING (mon)
) sub;
ORDER BY mon;
要点
使用
to_char()
to format your month any way you like. And do it just once at the end. The template patternFMMM
生成不带前导零的月份数字,就像您原来的那样。使用
date_trunc()
将您的timestamp without time zone
粒度化为月份分辨率(月份的第一个时间戳,但这在这里没有区别)。我添加了
ORDER BY mon
以获得您评论的排序顺序。这按预期工作,因为列mon
仍然是timestamp
(尚未转换为字符串 (text
)。由于
u.email
定义为NOT NULL
,count(*)
在这个上下文中与count(u.email)
的作用相同,更便宜一些。使用明确的
JOIN
语法。性能相同,但更清晰。我将汇总计数转换为
integer
。这完全是 optional(假设你不会有整数溢出)。所以你在结果中有所有整数而不是bigint
和numeric
与您的原始版本相比,您会发现它更短更快。
如果性能很重要,请确保在相关列上有索引。如果 users
中有 多个 个条目到 customerusermap
中的一个条目,则有更复杂的选项 JOIN LATERAL
可以使您的查询更快:
- Optimize GROUP BY query to retrieve latest record per user
如果您想将没有任何 activity 的月份包括在完整的月份列表中,LEFT JOIN
将改为包含月份。示例:
- PostgreSQL: running count of rows for a query 'by minute'