来自一个 table 的两个计数与来自另一个 table 的附加数据的总和

Question

我有两个表如下：

TABLE A

| id | col_a | col_b | user_id |
--------------------------------
| 1  | false | true  | 1       |
| 2  | false | true  | 2       |
| 3  | true  | true  | 2       |
| 4  | true  | true  | 3       |
| 5  | true  | false | 1       |

TABLE B

| id | name  |
--------------
| 1  | Bob   |
| 2  | Jim   | 
| 3  | Helen |
| 4  | Michael|
| 5  | Jen   |

我想得到两个计数的总和，它们是 col_a 中 true 个值的数量和 col_b 中 true 个值的数量。我想按 user_id 对数据进行分组。我也想加入TableB，得到每个用户的名字。结果将如下所示：

|user_id|total (col_a + col_b)|name
------------------------------------
| 1     | 2                   | Bob
| 2     | 3                   | Jim
| 3     | 2                   | Helen

到目前为止，我通过以下查询得到了总和：

 SELECT
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_a" is true)+
(SELECT COUNT(*) FROM "TABLE_A" WHERE "col_b" is true)
as total

但是，我不确定如何按 user_id 对这些计数进行分组。

Answer 1

select user_id,name
 , count(case when col_a = true then 1 end)
 + count(case when col_b = true then 1 end) total
from tableA a
join TableB b on a.user_id= b.id 
group by user_id,name

Answer 2

你在重复计算 JIM，如果这不是应该的，因为它只显示两行而不是三行，也许你可以执行以下操作：

with cte_A as (
    select col_a as col, user_id 
    from A 
    where col_a=true
    union -- ALL -- (if you want to double count Jim)
    select col_b as col, user_id 
    from A 
    where col_b=true
)
select B.user_id, sum(*) as total, B.name
from cte_A
join B
on cte_A.user_id = B.user_id
group by B.user_id

如果你真的想重复计算然后使用 UNION ALL 而不是 UNION

Answer 3

像这样的东西 通常最快:

SELECT *
FROM   "TABLE_B" b
JOIN  (
   SELECT user_id AS id
        , count(*) FILTER (WHERE col_a)
        + count(*) FILTER (WHERE col_b) AS total
   FROM   "TABLE_A"
   GROUP  BY 1
   ) a USING (id);

获取所有行时，先聚合，再加入。那更便宜。参见：

Query with LEFT JOIN not returning rows for count of 0

聚合 FILTER 子句通常最快。参见：

For absolute performance, is SUM faster or COUNT?
Aggregate columns with additional (distinct) filters

通常，您希望在结果中保持总计数为 0。你确实说过：

get the name of each user.

SELECT b.id AS user_id, b.name, COALESCE(a.total, 0) AS total
FROM   "TABLE_B" b
LEFT   JOIN (
   SELECT user_id AS id
        , count(col_a OR NULL)
        + count(col_b OR NULL) AS total
   FROM   "TABLE_A"
   GROUP  BY 1
   ) a USING (id);
...

count(col_a OR NULL) 是等效的替代方案，最短，但速度仍然很快。（使用上面的 FILTER 子句以获得最佳性能。）
LEFT JOIN 在结果中保留来自 "TABLE_B" 的所有行。
COALESCE() return 0 而不是总计数 NULL。

如果 col_a 和 col_b 只有 几个 true 值，这通常（快）得多 - 基本上是你已经拥有的:

SELECT b.*, COALESCE(aa.ct, 0) + COALESCE(ab.ct, 0) AS total
FROM   "TABLE_B" b
LEFT   JOIN (
   SELECT user_id AS id, count(*) AS ct
   FROM   "TABLE_A"
   WHERE  col_a
   GROUP  BY 1
   ) aa USING (id)
LEFT   JOIN (
   SELECT user_id AS id, count(*) AS ct
   FROM   "TABLE_A"
   WHERE  col_b
   GROUP  BY 1
   ) ab USING (id);

特别是（在这种情况下很小！）部分索引，如：

CREATE INDEX a_true_idx on "TABLE_A" (user_id) WHERE col_a;
CREATE INDEX b_true_idx on "TABLE_A" (user_id) WHERE col_b;

旁白：在 Postgres 中使用合法的、小写的、不带引号的名称使您的喜欢更简单。

Are PostgreSQL column names case-sensitive?

来自一个 table 的两个计数与来自另一个 table 的附加数据的总和

Sum of two counts from one table with additional data from another table

sql

postgresql

join

aggregate

count