来自一个 table 的两个计数与来自另一个 table 的附加数据的总和
Sum of two counts from one table with additional data from another table
我有两个表如下:
TABLE A
| id | col_a | col_b | user_id |
--------------------------------
| 1 | false | true | 1 |
| 2 | false | true | 2 |
| 3 | true | true | 2 |
| 4 | true | true | 3 |
| 5 | true | false | 1 |
TABLE B
| id | name |
--------------
| 1 | Bob |
| 2 | Jim |
| 3 | Helen |
| 4 | Michael|
| 5 | Jen |
我想得到两个计数的总和,它们是 col_a
中 true
个值的数量和 col_b
中 true
个值的数量。我想按 user_id
对数据进行分组。我也想加入TableB,得到每个用户的名字。结果将如下所示:
|user_id|total (col_a + col_b)|name
------------------------------------
| 1 | 2 | Bob
| 2 | 3 | Jim
| 3 | 2 | Helen
到目前为止,我通过以下查询得到了总和:
SELECT
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true)
as total
但是,我不确定如何按 user_id 对这些计数进行分组。
select user_id,name
, count(case when col_a = true then 1 end)
+ count(case when col_b = true then 1 end) total
from tableA a
join TableB b on a.user_id= b.id
group by user_id,name
你在重复计算 JIM,如果这不是应该的,因为它只显示两行而不是三行,也许你可以执行以下操作:
with cte_A as (
select col_a as col, user_id
from A
where col_a=true
union -- ALL -- (if you want to double count Jim)
select col_b as col, user_id
from A
where col_b=true
)
select B.user_id, sum(*) as total, B.name
from cte_A
join B
on cte_A.user_id = B.user_id
group by B.user_id
如果你真的想重复计算然后使用 UNION ALL 而不是 UNION
像这样的东西 通常最快:
SELECT *
FROM "TABLE_B" b
JOIN (
SELECT user_id AS id
, count(*) FILTER (WHERE col_a)
+ count(*) FILTER (WHERE col_b) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
获取所有行时,先聚合,再加入。那更便宜。参见:
- Query with LEFT JOIN not returning rows for count of 0
聚合 FILTER
子句通常最快。参见:
- For absolute performance, is SUM faster or COUNT?
- Aggregate columns with additional (distinct) filters
通常,您希望在结果中保持总计数为 0。你确实说过:
get the name of each user.
SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id
, count(col_a OR NULL)
+ count(col_b OR NULL) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
...
count(col_a OR NULL)
是等效的替代方案,最短,但速度仍然很快。 (使用上面的 FILTER
子句以获得最佳性能。)
LEFT JOIN
在结果中保留来自 "TABLE_B"
的所有行。
COALESCE()
return 0
而不是总计数 NULL
。
如果 col_a
和 col_b
只有 几个 true
值,这通常(快)得多 - 基本上是你已经拥有的:
SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id, count(*) AS ct
FROM "TABLE_A"
WHERE col_a
GROUP BY 1
) aa USING (id)
LEFT JOIN (
SELECT user_id AS id, count(*) AS ct
FROM "TABLE_A"
WHERE col_b
GROUP BY 1
) ab USING (id);
特别是(在这种情况下很小!)部分索引,如:
CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a;
CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;
旁白:在 Postgres 中使用合法的、小写的、不带引号的名称使您的喜欢更简单。
- Are PostgreSQL column names case-sensitive?
我有两个表如下:
TABLE A
| id | col_a | col_b | user_id |
--------------------------------
| 1 | false | true | 1 |
| 2 | false | true | 2 |
| 3 | true | true | 2 |
| 4 | true | true | 3 |
| 5 | true | false | 1 |
TABLE B
| id | name |
--------------
| 1 | Bob |
| 2 | Jim |
| 3 | Helen |
| 4 | Michael|
| 5 | Jen |
我想得到两个计数的总和,它们是 col_a
中 true
个值的数量和 col_b
中 true
个值的数量。我想按 user_id
对数据进行分组。我也想加入TableB,得到每个用户的名字。结果将如下所示:
|user_id|total (col_a + col_b)|name
------------------------------------
| 1 | 2 | Bob
| 2 | 3 | Jim
| 3 | 2 | Helen
到目前为止,我通过以下查询得到了总和:
SELECT
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true)
as total
但是,我不确定如何按 user_id 对这些计数进行分组。
select user_id,name
, count(case when col_a = true then 1 end)
+ count(case when col_b = true then 1 end) total
from tableA a
join TableB b on a.user_id= b.id
group by user_id,name
你在重复计算 JIM,如果这不是应该的,因为它只显示两行而不是三行,也许你可以执行以下操作:
with cte_A as (
select col_a as col, user_id
from A
where col_a=true
union -- ALL -- (if you want to double count Jim)
select col_b as col, user_id
from A
where col_b=true
)
select B.user_id, sum(*) as total, B.name
from cte_A
join B
on cte_A.user_id = B.user_id
group by B.user_id
如果你真的想重复计算然后使用 UNION ALL 而不是 UNION
像这样的东西 通常最快:
SELECT *
FROM "TABLE_B" b
JOIN (
SELECT user_id AS id
, count(*) FILTER (WHERE col_a)
+ count(*) FILTER (WHERE col_b) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
获取所有行时,先聚合,再加入。那更便宜。参见:
- Query with LEFT JOIN not returning rows for count of 0
聚合 FILTER
子句通常最快。参见:
- For absolute performance, is SUM faster or COUNT?
- Aggregate columns with additional (distinct) filters
通常,您希望在结果中保持总计数为 0。你确实说过:
get the name of each user.
SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id
, count(col_a OR NULL)
+ count(col_b OR NULL) AS total
FROM "TABLE_A"
GROUP BY 1
) a USING (id);
...
count(col_a OR NULL)
是等效的替代方案,最短,但速度仍然很快。 (使用上面的 FILTER
子句以获得最佳性能。)
LEFT JOIN
在结果中保留来自 "TABLE_B"
的所有行。
COALESCE()
return 0
而不是总计数 NULL
。
如果 col_a
和 col_b
只有 几个 true
值,这通常(快)得多 - 基本上是你已经拥有的:
SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total
FROM "TABLE_B" b
LEFT JOIN (
SELECT user_id AS id, count(*) AS ct
FROM "TABLE_A"
WHERE col_a
GROUP BY 1
) aa USING (id)
LEFT JOIN (
SELECT user_id AS id, count(*) AS ct
FROM "TABLE_A"
WHERE col_b
GROUP BY 1
) ab USING (id);
特别是(在这种情况下很小!)部分索引,如:
CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a;
CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;
旁白:在 Postgres 中使用合法的、小写的、不带引号的名称使您的喜欢更简单。
- Are PostgreSQL column names case-sensitive?