使用分组查询从 table 中选择特定行

Selecting specific rows from a table, using a group by query

我有一个 table,看起来像这样:

+------------+------------+--------------+
| Date       | Name       | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason      | 3            |
| 2021-02-01 | Nisha      | 4            |
| 2021-02-01 | Zaid       | 5            |
| 2021-03-25 | Aniket     | 4            |
| 2021-03-25 | Anish      | 2            |
| 2021-03-25 | Nadia      | 0            |
| 2021-05-06 | Aadil      | 7            |
| 2021-05-06 | Ashish     | 1            |
| 2021-05-06 | Rahil      | 9            |
+------------+------------+--------------+

此结果是通过执行以下 SQL 查询获得的:

SELECT 
    Date, Name, COUNT(Certificates) as Certificates
FROM Students.data
GROUP BY Date, Name
ORDER BY Date, Name;

收到这个结果后,理想情况下,我现在只想要每个日期的第一个条目(基本上是每个日期的名字),应该是这样的:

+------------+------------+--------------+
| Date       | Name       | Certificates |
+------------+------------+--------------+
| 2021-02-01 | Jason      | 3            |
| 2021-03-25 | Aniket     | 4            |
| 2021-05-06 | Aadil      | 7            |
+------------+------------+--------------+

有没有一种方法可以通过查询修改上述分组以获得结果,或者我是否需要将此查询的结果传递给其他查询,如果是这样,该查询是什么。 谢谢

另外,我使用的数据库是Clickhouse

注意:如果问题有任何问题,请告诉我,可以澄清一下。

在查看您的输出时,我假设您希望当天的唯一条目是名称列中按字母顺序 ASC 排列的条目。

在这种情况下,如果此 SQL 服务器

您可以使用 ROW_NUMBER() 功能
SELECT Date,Name, Certificates
FROM
(
SELECT 
    Date, Name, 
    Certificates=COUNT(Certificates) OVER (PARTITION BY Date,Name) 
    RowNumber = ROW_NUMBER() OVER (PARTITION BY Date
     ORDER BY Name ASC) 
FROM Students.data
) T 
WHERE RowNumber =1 
ORDER BY Date ASC
;

您认为您的结果是中间结果,您希望每个日期从中选择一行。您可以使用 ROW_NUMBER 来按名称对每个日期的行进行编号,并且只保留日期的第一行(那些编号为 1 的行)。

SELECT date, name, certificates
FROM
(
  SELECT 
    date, name, COUNT(Certificates) AS certificates,
    ROW_NUMBER() OVER (PARTITION BY date ORDER BY name) AS rn
  FROM students.data
  GROUP BY date, name
) numbered
WHERE rn = 1
ORDER BY date;

演示:https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=93c3682bda72cb4fe53fbbe8053a8acb(这里使用 MySQL 8,因为 dbfiddle.uk 没有 clickhouse,但是查询是标准的 SQL 兼容的,所以我们可以使用关于演示的每个现代 RDBMS)。

  • 使用CTE代替子查询
  • 对数据进行排名 => 每行将具有相同的数据,但排名会递增 --> ROW_NUMBER
  • 按 1 过滤 rank_ 以获得每个日期一个条目
  • 假设您需要的是按字母顺序排列的名称

如果您还没有计数,请使用 代码 1 fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a5a8bf3f6934f18b19d331d3ba43570a

与 ranked_data AS ( SELECT 日期_,姓名, 计数(证书_)超过(按日期_,名称分区)作为证书, row_number() OVER(PARTITION BY date_ order by name) as rank_ 来自学生 ) SELECT date_, name, certificates FROM ranked_data WHERE rank_ = 1

如果您知道计数,则使用 代码 2


WITH ranked_data AS (
SELECT date_, name, certificates_,
row_number() OVER(PARTITION BY date_ order by name) as rank_
FROM students
)
SELECT 
  date_, name, certificates_ FROM ranked_data WHERE rank_ = 1

  • 直截了当的方式
SELECT Date, untuple(groupArray(tuple(Name, Certificates))[1])
FROM (
    SELECT *
    FROM  (
        /* Emulate the test dataset. */
        SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
        FROM (
            SELECT arrayJoin([
                ('2021-02-01', 'Jason ', 3),
                ('2021-02-01', 'Nisha ', 4),
                ('2021-02-01', 'Zaid  ', 5),
                ('2021-03-25', 'Aniket', 4),
                ('2021-03-25', 'Anish ', 2),
                ('2021-03-25', 'Nadia ', 0),
                ('2021-05-06', 'Aadil ', 7),
                ('2021-05-06', 'Ashish', 1),
                ('2021-05-06', 'Rahil ', 9)]) AS data
            )
        )
    ORDER BY Date, Name
    )
GROUP BY Date

/*
┌───────Date─┬─Name───┬─Certificates─┐
│ 2021-02-01 │ Jason  │            3 │
│ 2021-03-25 │ Aniket │            4 │
│ 2021-05-06 │ Aadil  │            7 │
└────────────┴────────┴──────────────┘
*/
  • 方式基于window-函数

version 21.4 开始添加了对 window 函数的 完整 支持。此时它被标记为实验性功能

SELECT DISTINCT
    Date,
    FIRST_VALUE(Name) OVER w AS FirstName,
    FIRST_VALUE(Certificates) OVER w AS FirstCertificates
FROM 
(
    /* Emulate the test dataset. */
    SELECT toDate(data.1) AS Date, data.2 AS Name, data.3 AS Certificates
    FROM (
        SELECT arrayJoin([
            ('2021-02-01', 'Jason ', 3),
            ('2021-02-01', 'Nisha ', 4),
            ('2021-02-01', 'Zaid  ', 5),
            ('2021-03-25', 'Aniket', 4),
            ('2021-03-25', 'Anish ', 2),
            ('2021-03-25', 'Nadia ', 0),
            ('2021-05-06', 'Aadil ', 7),
            ('2021-05-06', 'Ashish', 1),
            ('2021-05-06', 'Rahil ', 9)]) AS data
        )
)
WINDOW w AS (PARTITION BY Date ORDER BY Name ASC)
SETTINGS allow_experimental_window_functions = 1

/*
┌───────Date─┬─FirstName─┬─FirstCertificates─┐
│ 2021-02-01 │ Jason     │                 3 │
│ 2021-03-25 │ Aniket    │                 4 │
│ 2021-05-06 │ Aadil     │                 7 │
└────────────┴───────────┴───────────────────┘
*/

参见https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art