获取每个 student_id 具有最小值的 2 个选项

Question

我有 table 名字 m_option:

m_option_id  m_student_id  value
1             1             5
2             1             5
3             1             6
4             1             7
5             2             1
6             2             2
7             2             3
8             2             3
9             2             4

我想为每个 m_student_id:

获取 min value 的 2 行

m_option_id  m_student_id  value
1             1             5
2             1             5
5             2             1
6             2             2

Answer 1

您可以为此使用 row_number window 函数：

SELECT m_option_id, m_student_id, value
FROM (
    SELECT
        m_option_id, m_student_id, value,
        row_number() OVER (PARTITION BY m_student_id ORDER BY value)
    FROM m_option
) t
WHERE
    row_number <= 2;

row_number 将计算其组内每一行的数量。然后我们使用该数字从每个组中过滤前 2 行（即最低 value）。

或者，您可以使用 LATERAL 子查询：

SELECT m_option_id, m_student_id, value
FROM (SELECT DISTINCT m_student_id FROM m_option) s,
     LATERAL (
         SELECT m_option_id, value
         FROM m_option
         WHERE s.m_student_id=m_student_id
         ORDER BY value
         LIMIT 2
     ) t;

这将遍历 m_student_id 的所有不同值，并且对于每个值，将使用 LATERAL 子查询找到前 2 行。

Answer 2

假设tablem_option中每个学生可以有很多行，性能的关键是索引使用。如果您有一个 单独的 student table 唯一列出所有学生（您通常会有），那将是最有效的。那么：

SELECT m.m_option_id, s.student_id AS m_student_id, m.value
FROM   student s
    ,  LATERAL (
   SELECT m_option_id, value
   FROM   m_option
   WHERE  m_student_id = s.student_id  -- PK of table student
   ORDER  BY value
   LIMIT  2
   ) m;

m_option 上的多列索引 使这个快速:

CREATE INDEX m_option_combo_idx ON m_option (m_student_id, value);

如果你能得到index-only scans，追加列m_option_id作为最后一个索引项：

CREATE INDEX m_option_combo_idx ON m_option (m_student_id, value, m_option_id)

按此顺序索引列。

Is a composite index also good for queries on the first field?

从 m_option 中提取 student_id 的唯一列表会导致对 m_option 进行昂贵的顺序扫描并使任何性能优势无效。

这排除了 m_option 中没有任何相关行的学生。使用 LEFT JOIN LATERAL () ON true 将此类学生包含在结果中（使用缺失选项的 NULL 值进行扩展）：

如果您没有 student table，另一个快速选项是递归 CTE。
任一变体的详细说明：

Optimize GROUP BY query to retrieve latest record per user

获取每个 student_id 具有最小值的 2 个选项

Get the 2 options with min value for each each student_id

sql

postgresql

greatest-n-per-group