使用分组查询从 table 中选择特定行
Selecting specific rows from a table, using a group by query
我有一个 table,看起来像这样:
+------------+------------+--------------+
| Date | Name | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason | 3 |
| 2021-02-01 | Nisha | 4 |
| 2021-02-01 | Zaid | 5 |
| 2021-03-25 | Aniket | 4 |
| 2021-03-25 | Anish | 2 |
| 2021-03-25 | Nadia | 0 |
| 2021-05-06 | Aadil | 7 |
| 2021-05-06 | Ashish | 1 |
| 2021-05-06 | Rahil | 9 |
+------------+------------+--------------+
此结果是通过执行以下 SQL 查询获得的:
SELECT
Date, Name, COUNT(Certificates) as Certificates
FROM Students.data
GROUP BY Date, Name
ORDER BY Date, Name;
收到这个结果后,理想情况下,我现在只想要每个日期的第一个条目(基本上是每个日期的名字),应该是这样的:
+------------+------------+--------------+
| Date | Name | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason | 3 |
| 2021-03-25 | Aniket | 4 |
| 2021-05-06 | Aadil | 7 |
+------------+------------+--------------+
有没有一种方法可以通过查询修改上述分组以获得结果,或者我是否需要将此查询的结果传递给其他查询,如果是这样,该查询是什么。
谢谢
另外,我使用的数据库是Clickhouse
注意:如果问题有任何问题,请告诉我,可以澄清一下。
在查看您的输出时,我假设您希望当天的唯一条目是名称列中按字母顺序 ASC 排列的条目。
在这种情况下,如果此 SQL 服务器
您可以使用 ROW_NUMBER() 功能
SELECT Date,Name, Certificates
FROM
(
SELECT
Date, Name,
Certificates=COUNT(Certificates) OVER (PARTITION BY Date,Name)
RowNumber = ROW_NUMBER() OVER (PARTITION BY Date
ORDER BY Name ASC)
FROM Students.data
) T
WHERE RowNumber =1
ORDER BY Date ASC
;
您认为您的结果是中间结果,您希望每个日期从中选择一行。您可以使用 ROW_NUMBER
来按名称对每个日期的行进行编号,并且只保留日期的第一行(那些编号为 1 的行)。
SELECT date, name, certificates
FROM
(
SELECT
date, name, COUNT(Certificates) AS certificates,
ROW_NUMBER() OVER (PARTITION BY date ORDER BY name) AS rn
FROM students.data
GROUP BY date, name
) numbered
WHERE rn = 1
ORDER BY date;
演示:https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=93c3682bda72cb4fe53fbbe8053a8acb(这里使用 MySQL 8,因为 dbfiddle.uk 没有 clickhouse,但是查询是标准的 SQL 兼容的,所以我们可以使用关于演示的每个现代 RDBMS)。
- 使用
CTE
代替子查询
- 对数据进行排名 => 每行将具有相同的数据,但排名会递增 -->
ROW_NUMBER
- 按 1 过滤 rank_ 以获得每个日期一个条目
- 假设您需要的是按字母顺序排列的名称
如果您还没有计数,请使用 代码 1
fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a5a8bf3f6934f18b19d331d3ba43570a
与 ranked_data AS (
SELECT 日期_,姓名,
计数(证书_)超过(按日期_,名称分区)作为证书,
row_number() OVER(PARTITION BY date_ order by name) as rank_
来自学生
)
SELECT
date_, name, certificates FROM ranked_data WHERE rank_ = 1
如果您知道计数,则使用 代码 2
WITH ranked_data AS (
SELECT date_, name, certificates_,
row_number() OVER(PARTITION BY date_ order by name) as rank_
FROM students
)
SELECT
date_, name, certificates_ FROM ranked_data WHERE rank_ = 1
- 直截了当的方式
SELECT Date, untuple(groupArray(tuple(Name, Certificates))[1])
FROM (
SELECT *
FROM (
/* Emulate the test dataset. */
SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
FROM (
SELECT arrayJoin([
('2021-02-01', 'Jason ', 3),
('2021-02-01', 'Nisha ', 4),
('2021-02-01', 'Zaid ', 5),
('2021-03-25', 'Aniket', 4),
('2021-03-25', 'Anish ', 2),
('2021-03-25', 'Nadia ', 0),
('2021-05-06', 'Aadil ', 7),
('2021-05-06', 'Ashish', 1),
('2021-05-06', 'Rahil ', 9)]) AS data
)
)
ORDER BY Date, Name
)
GROUP BY Date
/*
┌───────Date─┬─Name───┬─Certificates─┐
│ 2021-02-01 │ Jason │ 3 │
│ 2021-03-25 │ Aniket │ 4 │
│ 2021-05-06 │ Aadil │ 7 │
└────────────┴────────┴──────────────┘
*/
- 方式基于window-函数
从 version 21.4 开始添加了对 window 函数的 完整 支持。此时它被标记为实验性功能。
SELECT DISTINCT
Date,
FIRST_VALUE(Name) OVER w AS FirstName,
FIRST_VALUE(Certificates) OVER w AS FirstCertificates
FROM
(
/* Emulate the test dataset. */
SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
FROM (
SELECT arrayJoin([
('2021-02-01', 'Jason ', 3),
('2021-02-01', 'Nisha ', 4),
('2021-02-01', 'Zaid ', 5),
('2021-03-25', 'Aniket', 4),
('2021-03-25', 'Anish ', 2),
('2021-03-25', 'Nadia ', 0),
('2021-05-06', 'Aadil ', 7),
('2021-05-06', 'Ashish', 1),
('2021-05-06', 'Rahil ', 9)]) AS data
)
)
WINDOW w AS (PARTITION BY Date ORDER BY Name ASC)
SETTINGS allow_experimental_window_functions = 1
/*
┌───────Date─┬─FirstName─┬─FirstCertificates─┐
│ 2021-02-01 │ Jason │ 3 │
│ 2021-03-25 │ Aniket │ 4 │
│ 2021-05-06 │ Aadil │ 7 │
└────────────┴───────────┴───────────────────┘
*/
参见https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art。
我有一个 table,看起来像这样:
+------------+------------+--------------+
| Date | Name | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason | 3 |
| 2021-02-01 | Nisha | 4 |
| 2021-02-01 | Zaid | 5 |
| 2021-03-25 | Aniket | 4 |
| 2021-03-25 | Anish | 2 |
| 2021-03-25 | Nadia | 0 |
| 2021-05-06 | Aadil | 7 |
| 2021-05-06 | Ashish | 1 |
| 2021-05-06 | Rahil | 9 |
+------------+------------+--------------+
此结果是通过执行以下 SQL 查询获得的:
SELECT
Date, Name, COUNT(Certificates) as Certificates
FROM Students.data
GROUP BY Date, Name
ORDER BY Date, Name;
收到这个结果后,理想情况下,我现在只想要每个日期的第一个条目(基本上是每个日期的名字),应该是这样的:
+------------+------------+--------------+
| Date | Name | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason | 3 |
| 2021-03-25 | Aniket | 4 |
| 2021-05-06 | Aadil | 7 |
+------------+------------+--------------+
有没有一种方法可以通过查询修改上述分组以获得结果,或者我是否需要将此查询的结果传递给其他查询,如果是这样,该查询是什么。 谢谢
另外,我使用的数据库是Clickhouse
注意:如果问题有任何问题,请告诉我,可以澄清一下。
在查看您的输出时,我假设您希望当天的唯一条目是名称列中按字母顺序 ASC 排列的条目。
在这种情况下,如果此 SQL 服务器
您可以使用 ROW_NUMBER() 功能SELECT Date,Name, Certificates
FROM
(
SELECT
Date, Name,
Certificates=COUNT(Certificates) OVER (PARTITION BY Date,Name)
RowNumber = ROW_NUMBER() OVER (PARTITION BY Date
ORDER BY Name ASC)
FROM Students.data
) T
WHERE RowNumber =1
ORDER BY Date ASC
;
您认为您的结果是中间结果,您希望每个日期从中选择一行。您可以使用 ROW_NUMBER
来按名称对每个日期的行进行编号,并且只保留日期的第一行(那些编号为 1 的行)。
SELECT date, name, certificates
FROM
(
SELECT
date, name, COUNT(Certificates) AS certificates,
ROW_NUMBER() OVER (PARTITION BY date ORDER BY name) AS rn
FROM students.data
GROUP BY date, name
) numbered
WHERE rn = 1
ORDER BY date;
演示:https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=93c3682bda72cb4fe53fbbe8053a8acb(这里使用 MySQL 8,因为 dbfiddle.uk 没有 clickhouse,但是查询是标准的 SQL 兼容的,所以我们可以使用关于演示的每个现代 RDBMS)。
- 使用
CTE
代替子查询 - 对数据进行排名 => 每行将具有相同的数据,但排名会递增 -->
ROW_NUMBER
- 按 1 过滤 rank_ 以获得每个日期一个条目
- 假设您需要的是按字母顺序排列的名称
如果您还没有计数,请使用 代码 1 fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a5a8bf3f6934f18b19d331d3ba43570a
与 ranked_data AS ( SELECT 日期_,姓名, 计数(证书_)超过(按日期_,名称分区)作为证书, row_number() OVER(PARTITION BY date_ order by name) as rank_ 来自学生 ) SELECT date_, name, certificates FROM ranked_data WHERE rank_ = 1
如果您知道计数,则使用 代码 2
WITH ranked_data AS (
SELECT date_, name, certificates_,
row_number() OVER(PARTITION BY date_ order by name) as rank_
FROM students
)
SELECT
date_, name, certificates_ FROM ranked_data WHERE rank_ = 1
- 直截了当的方式
SELECT Date, untuple(groupArray(tuple(Name, Certificates))[1])
FROM (
SELECT *
FROM (
/* Emulate the test dataset. */
SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
FROM (
SELECT arrayJoin([
('2021-02-01', 'Jason ', 3),
('2021-02-01', 'Nisha ', 4),
('2021-02-01', 'Zaid ', 5),
('2021-03-25', 'Aniket', 4),
('2021-03-25', 'Anish ', 2),
('2021-03-25', 'Nadia ', 0),
('2021-05-06', 'Aadil ', 7),
('2021-05-06', 'Ashish', 1),
('2021-05-06', 'Rahil ', 9)]) AS data
)
)
ORDER BY Date, Name
)
GROUP BY Date
/*
┌───────Date─┬─Name───┬─Certificates─┐
│ 2021-02-01 │ Jason │ 3 │
│ 2021-03-25 │ Aniket │ 4 │
│ 2021-05-06 │ Aadil │ 7 │
└────────────┴────────┴──────────────┘
*/
- 方式基于window-函数
从 version 21.4 开始添加了对 window 函数的 完整 支持。此时它被标记为实验性功能。
SELECT DISTINCT
Date,
FIRST_VALUE(Name) OVER w AS FirstName,
FIRST_VALUE(Certificates) OVER w AS FirstCertificates
FROM
(
/* Emulate the test dataset. */
SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
FROM (
SELECT arrayJoin([
('2021-02-01', 'Jason ', 3),
('2021-02-01', 'Nisha ', 4),
('2021-02-01', 'Zaid ', 5),
('2021-03-25', 'Aniket', 4),
('2021-03-25', 'Anish ', 2),
('2021-03-25', 'Nadia ', 0),
('2021-05-06', 'Aadil ', 7),
('2021-05-06', 'Ashish', 1),
('2021-05-06', 'Rahil ', 9)]) AS data
)
)
WINDOW w AS (PARTITION BY Date ORDER BY Name ASC)
SETTINGS allow_experimental_window_functions = 1
/*
┌───────Date─┬─FirstName─┬─FirstCertificates─┐
│ 2021-02-01 │ Jason │ 3 │
│ 2021-03-25 │ Aniket │ 4 │
│ 2021-05-06 │ Aadil │ 7 │
└────────────┴───────────┴───────────────────┘
*/
参见https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art。