SQL 获取唯一值的计数

SQL get counts of unique values

如何使用 sqlite3 为这个内存数据库示例获取不同的“计数”列? 使用版本 3.27.2

示例数据库

CREATE TABLE events (
    id1, 
    id2, 
    id3, 
    PRIMARY KEY (id1, id2)
);

INSERT INTO events (id1, id2, id3)
VALUES 
   (1,1,99),
   (1,2,99),
   (1,3,52),
   (2,1,6),
   (2,2,7),
   (2,3,8)
;

.mode columns
.header on
SELECT * FROM events;

所需的打印输出

部分成功 以下适用于前两个新列。

SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count
FROM events
GROUP BY id1;

获取最后一列的最佳方法是什么?以下returnserror: no such column: total_count

SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count, (total_count - unique_count) AS repeated_count
FROM events
GROUP BY id1;

也许试试 CTE。尚未验证语法,但这似乎是查看 SQLlite 文档的有效选项。

With X as 
(
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count
FROM events
GROUP BY id1;
)
select id1, total_count, unique_count, (total_count - unique_count) AS repeated_count
from X

这没那么容易:-)

(1,1,99), (1,2,99), (1,3,52)

有一个ID重复(99)

(1,1,99), (1,2,99), (1,3,52), (1,4,99)

又重复了一个ID(还是99)

(1,1,99), (1,2,99), (1,3,52), (1,4,52)

有两个 ID 重复(52 和 99)。

仅按 ID1 进行聚合时,您会失去该知识。您会看到有多少行以及有多少不同的 ID3,但看不到这些 ID3 中有哪些重复。这意味着您需要一个中间步骤,即最终聚合之前的预聚合。

select
  id1,
  count(*) as total_count,
  count(distinct id3) as unique_count,
  count(case when cnt > 1 then 1 end) as repeated_count
from
(
  select id1, id3, count(*) as cnt
  from events
  group by id1, id3
) pre_aggregated
group by id1
order by id1;

如果你group by id1, id3喜欢这样:

SELECT id1, id3, COUNT(*) counter
FROM events
GROUP BY id1, id3;

你得到每个组合的行数 id1, id3:

id1 id3 counter
1 52 1
1 99 2
2 6 1
2 7 1
2 8 1

现在,您所要做的就是:

  • 对每个 id1 的列 counter 求和以获得该列 total_count
  • 计算每个 id1 的行数以获得 列 unique_count
  • 计算每个 id1 的行数,其中 counter 列是 > 1 以获取 repeated_id3

您可以使用 SUM()COUNT() window 函数执行此操作:

SELECT DISTINCT id1, 
       SUM(COUNT(*)) OVER (PARTITION BY id1) AS total_count, 
       COUNT(*) OVER (PARTITION BY id1) AS unique_count,
       SUM(COUNT(*) > 1) OVER (PARTITION BY id1) repeated_id3
FROM events
GROUP BY id1, id3;

参见demo
结果:

id1 total_count unique_count repeated_id3
1 3 2 1
2 3 3 0