来自一个 table 的 SELECT 个字段和来自相关 table 的聚合
SELECT fields from one table with aggregates from related table
这里是对 2 个表的简单描述:
CREATE TABLE jobs(id PRIMARY KEY, description);
CREATE TABLE dates(id PRIMARY KEY, job REFERENCES jobs(id), date);
每份工作可能有一个或多个日期。
我想创建一个生成以下内容的查询(在 pidgin 中):
jobs.id, jobs.description, min(dates.date) as start, max(dates.date) as finish
我试过这样的方法:
SELECT id, description,
(SELECT min(date) as start FROM dates d WHERE d.job=j.id),
(SELECT max(date) as finish FROM dates d WHERE d.job=j.id)
FROM jobs j;
这有效,但看起来效率很低。
我尝试了 INNER JOIN
,但看不到如何在 dates
上使用合适的聚合查询加入 jobs
。
任何人都可以建议一个干净有效的方法来做到这一点吗?
检索所有行时:先聚合,后加入:
SELECT id, j.description, d.start, d.finish
FROM jobs j
LEFT JOIN (
SELECT job AS id, min(date) AS start, max(date) AS finish
FROM dates
GROUP BY job
) d USING (id);
相关:
关于JOIN .. USING
这不是 "different type of join"。 USING (col)
是 标准 SQL (!) ON a.col = b.col
的语法快捷方式。更准确地说,quoting the manual:
The USING
clause is a shorthand that allows you to take advantage of
the specific situation where both sides of the join use the same name
for the joining column(s). It takes a comma-separated list of the
shared column names and forms a join condition that includes an
equality comparison for each one. For example, joining T1
and T2
with
USING (a, b)
produces the join condition ON *T1*.a = *T2*.a AND *T1*.b = *T2*.b
.
Furthermore, the output of JOIN USING
suppresses redundant columns:
there is no need to print both of the matched columns, since they must
have equal values. While JOIN ON
produces all columns from T1
followed
by all columns from T2
, JOIN USING
produces one output column for each
of the listed column pairs (in the listed order), followed by any
remaining columns from T1
, followed by any remaining columns from T2
.
特别方便的是可以写SELECT * FROM ...
,加入的列只列出一次
除了, you can also use a window clause:
SELECT j.id, j.description,
first_value(d.date) OVER w AS start,
last_value(d.date) OVER w AS finish
FROM jobs j
JOIN dates d ON d.job = j.id
WINDOW w AS (PARTITION BY j.id ORDER BY d.date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
Window 函数有效地按一个或多个列分组(PARTITION BY
子句) and/or ORDER BY
一些其他列然后你可以应用一些 window function,甚至是常规聚合函数,而不影响任何其他列的分组或排序(在您的情况下为 description
)。它需要一种稍微不同的构造查询的方式,但是一旦你明白了它是非常棒的。
在您的情况下,您需要获取 partition 的第一个值,这很容易,因为默认情况下可以访问它。您还需要查看 window 帧 (默认情况下以当前行结束)到 分区 中的最后一个值然后你需要 ROWS
子句。由于您使用相同的 window 定义生成两列,因此此处使用 WINDOW
子句;如果它适用于单个列,您只需在 select 列表中编写 window 函数,然后在 OVER
子句和 window 定义中不带名称(WINDOW w AS (...)
).
这里是对 2 个表的简单描述:
CREATE TABLE jobs(id PRIMARY KEY, description);
CREATE TABLE dates(id PRIMARY KEY, job REFERENCES jobs(id), date);
每份工作可能有一个或多个日期。
我想创建一个生成以下内容的查询(在 pidgin 中):
jobs.id, jobs.description, min(dates.date) as start, max(dates.date) as finish
我试过这样的方法:
SELECT id, description,
(SELECT min(date) as start FROM dates d WHERE d.job=j.id),
(SELECT max(date) as finish FROM dates d WHERE d.job=j.id)
FROM jobs j;
这有效,但看起来效率很低。
我尝试了 INNER JOIN
,但看不到如何在 dates
上使用合适的聚合查询加入 jobs
。
任何人都可以建议一个干净有效的方法来做到这一点吗?
检索所有行时:先聚合,后加入:
SELECT id, j.description, d.start, d.finish
FROM jobs j
LEFT JOIN (
SELECT job AS id, min(date) AS start, max(date) AS finish
FROM dates
GROUP BY job
) d USING (id);
相关:
关于JOIN .. USING
这不是 "different type of join"。 USING (col)
是 标准 SQL (!) ON a.col = b.col
的语法快捷方式。更准确地说,quoting the manual:
The
USING
clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joiningT1
andT2
withUSING (a, b)
produces the join conditionON *T1*.a = *T2*.a AND *T1*.b = *T2*.b
.Furthermore, the output of
JOIN USING
suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. WhileJOIN ON
produces all columns fromT1
followed by all columns fromT2
,JOIN USING
produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns fromT1
, followed by any remaining columns fromT2
.
特别方便的是可以写SELECT * FROM ...
,加入的列只列出一次
除了
SELECT j.id, j.description,
first_value(d.date) OVER w AS start,
last_value(d.date) OVER w AS finish
FROM jobs j
JOIN dates d ON d.job = j.id
WINDOW w AS (PARTITION BY j.id ORDER BY d.date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
Window 函数有效地按一个或多个列分组(PARTITION BY
子句) and/or ORDER BY
一些其他列然后你可以应用一些 window function,甚至是常规聚合函数,而不影响任何其他列的分组或排序(在您的情况下为 description
)。它需要一种稍微不同的构造查询的方式,但是一旦你明白了它是非常棒的。
在您的情况下,您需要获取 partition 的第一个值,这很容易,因为默认情况下可以访问它。您还需要查看 window 帧 (默认情况下以当前行结束)到 分区 中的最后一个值然后你需要 ROWS
子句。由于您使用相同的 window 定义生成两列,因此此处使用 WINDOW
子句;如果它适用于单个列,您只需在 select 列表中编写 window 函数,然后在 OVER
子句和 window 定义中不带名称(WINDOW w AS (...)
).