SQL 获取唯一值的计数
SQL get counts of unique values
如何使用 sqlite3 为这个内存数据库示例获取不同的“计数”列?
使用版本 3.27.2
示例数据库
CREATE TABLE events (
id1,
id2,
id3,
PRIMARY KEY (id1, id2)
);
INSERT INTO events (id1, id2, id3)
VALUES
(1,1,99),
(1,2,99),
(1,3,52),
(2,1,6),
(2,2,7),
(2,3,8)
;
.mode columns
.header on
SELECT * FROM events;
所需的打印输出
部分成功
以下适用于前两个新列。
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count
FROM events
GROUP BY id1;
获取最后一列的最佳方法是什么?以下returnserror: no such column: total_count
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count, (total_count - unique_count) AS repeated_count
FROM events
GROUP BY id1;
也许试试 CTE。尚未验证语法,但这似乎是查看 SQLlite 文档的有效选项。
With X as
(
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count
FROM events
GROUP BY id1;
)
select id1, total_count, unique_count, (total_count - unique_count) AS repeated_count
from X
这没那么容易:-)
在
(1,1,99), (1,2,99), (1,3,52)
有一个ID重复(99)
在
(1,1,99), (1,2,99), (1,3,52), (1,4,99)
又重复了一个ID(还是99)
在
(1,1,99), (1,2,99), (1,3,52), (1,4,52)
有两个 ID 重复(52 和 99)。
仅按 ID1 进行聚合时,您会失去该知识。您会看到有多少行以及有多少不同的 ID3,但看不到这些 ID3 中有哪些重复。这意味着您需要一个中间步骤,即最终聚合之前的预聚合。
select
id1,
count(*) as total_count,
count(distinct id3) as unique_count,
count(case when cnt > 1 then 1 end) as repeated_count
from
(
select id1, id3, count(*) as cnt
from events
group by id1, id3
) pre_aggregated
group by id1
order by id1;
如果你group by id1, id3
喜欢这样:
SELECT id1, id3, COUNT(*) counter
FROM events
GROUP BY id1, id3;
你得到每个组合的行数 id1, id3
:
id1
id3
counter
1
52
1
1
99
2
2
6
1
2
7
1
2
8
1
现在,您所要做的就是:
- 对每个
id1
的列 counter
求和以获得该列
total_count
- 计算每个
id1
的行数以获得
列 unique_count
- 计算每个
id1
的行数,其中
counter
列是 > 1
以获取 repeated_id3
列
您可以使用 SUM()
和 COUNT()
window 函数执行此操作:
SELECT DISTINCT id1,
SUM(COUNT(*)) OVER (PARTITION BY id1) AS total_count,
COUNT(*) OVER (PARTITION BY id1) AS unique_count,
SUM(COUNT(*) > 1) OVER (PARTITION BY id1) repeated_id3
FROM events
GROUP BY id1, id3;
参见demo。
结果:
id1
total_count
unique_count
repeated_id3
1
3
2
1
2
3
3
0
如何使用 sqlite3 为这个内存数据库示例获取不同的“计数”列? 使用版本 3.27.2
示例数据库
CREATE TABLE events (
id1,
id2,
id3,
PRIMARY KEY (id1, id2)
);
INSERT INTO events (id1, id2, id3)
VALUES
(1,1,99),
(1,2,99),
(1,3,52),
(2,1,6),
(2,2,7),
(2,3,8)
;
.mode columns
.header on
SELECT * FROM events;
所需的打印输出
部分成功 以下适用于前两个新列。
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count
FROM events
GROUP BY id1;
获取最后一列的最佳方法是什么?以下returnserror: no such column: total_count
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count, (total_count - unique_count) AS repeated_count
FROM events
GROUP BY id1;
也许试试 CTE。尚未验证语法,但这似乎是查看 SQLlite 文档的有效选项。
With X as
(
SELECT id1, count(id3) AS total_count, count(DISTINCT id3) AS unique_count
FROM events
GROUP BY id1;
)
select id1, total_count, unique_count, (total_count - unique_count) AS repeated_count
from X
这没那么容易:-)
在
(1,1,99), (1,2,99), (1,3,52)
有一个ID重复(99)
在
(1,1,99), (1,2,99), (1,3,52), (1,4,99)
又重复了一个ID(还是99)
在
(1,1,99), (1,2,99), (1,3,52), (1,4,52)
有两个 ID 重复(52 和 99)。
仅按 ID1 进行聚合时,您会失去该知识。您会看到有多少行以及有多少不同的 ID3,但看不到这些 ID3 中有哪些重复。这意味着您需要一个中间步骤,即最终聚合之前的预聚合。
select
id1,
count(*) as total_count,
count(distinct id3) as unique_count,
count(case when cnt > 1 then 1 end) as repeated_count
from
(
select id1, id3, count(*) as cnt
from events
group by id1, id3
) pre_aggregated
group by id1
order by id1;
如果你group by id1, id3
喜欢这样:
SELECT id1, id3, COUNT(*) counter
FROM events
GROUP BY id1, id3;
你得到每个组合的行数 id1, id3
:
id1 | id3 | counter |
---|---|---|
1 | 52 | 1 |
1 | 99 | 2 |
2 | 6 | 1 |
2 | 7 | 1 |
2 | 8 | 1 |
现在,您所要做的就是:
- 对每个
id1
的列counter
求和以获得该列total_count
- 计算每个
id1
的行数以获得 列unique_count
- 计算每个
id1
的行数,其中counter
列是> 1
以获取repeated_id3
列
您可以使用 SUM()
和 COUNT()
window 函数执行此操作:
SELECT DISTINCT id1,
SUM(COUNT(*)) OVER (PARTITION BY id1) AS total_count,
COUNT(*) OVER (PARTITION BY id1) AS unique_count,
SUM(COUNT(*) > 1) OVER (PARTITION BY id1) repeated_id3
FROM events
GROUP BY id1, id3;
参见demo。
结果:
id1 | total_count | unique_count | repeated_id3 |
---|---|---|---|
1 | 3 | 2 | 1 |
2 | 3 | 3 | 0 |