当我按键聚合时，为什么不能从 `GROUP BY` 中排除依赖列？

Question

如果我有以下 tables（例如使用 PostgreSQL，但可以是任何其他关系数据库），其中 car 有两个键（id 和 vin):

create table car (
  id int primary key not null,
  color varchar(10),
  brand varchar(10),
  vin char(17) unique not null
);

create table appraisal (
  id int primary key not null,
  recorded date not null,
  car_id int references car (id),
  car_vin char(17) references car (vin),
  price int
);

我可以成功地将 c.color 和 c.brand 包含在 select 列表中而无需聚合它们，因为它们依赖于 c.id:

select 
  c.id, c.color, c.brand,
  min(price) as min_appraisal,
  max(price) as max_appraisal
from car c
left join appraisal a on a.car_id = c.id
group by c.id; -- c.color, c.brand are not needed here

但是，以下查询失败，因为它不允许我在 select 列表中包含 c.color 和 c.brand，即使它确实依赖于 c.vin（即键）的table.

select 
  c.vin, c.color, c.brand,
  min(price) as min_appraisal,
  max(price) as max_appraisal
from car c
left join appraisal a on a.car_vin = c.vin
group by c.vin; -- Why are c.color, c.brand needed here?

Error: ERROR: column "c.color" must appear in the GROUP BY clause or be used in an aggregate function Position: 18

DB Fiddle 中的示例。

Answer 1

因为只有 PK 涵盖 GROUP BY 子句中基础 table 的所有列。因此您的第一个查询有效。 UNIQUE 约束没有。

不可延迟的 UNIQUE 和 NOT NULL 约束的组合也符合条件。但这并没有实现——以及 SQL 标准已知的一些其他功能依赖项。该功能的主要作者 Peter Eisentraut 有更多想法，但当时确定需求低且相关成本可能很高。参见 discussion about the feature on pgsql-hackers.

The manual:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

和more explicitly:

PostgreSQL recognizes functional dependency (allowing columns to be omitted from GROUP BY) only when a table's primary key is included in the GROUP BY list. The SQL standard specifies additional conditions that should be recognized.

由于 c.vin 是 UNIQUE NOT NULL，您可以改用 PK 列来修复第二个查询：

...
group by c.id;

此外，虽然参照完整性被强制执行并且整个 table 被查询，但两个给定的查询都可以大大便宜：聚合行 appraisal before 加入。这消除了先验地在外部 SELECT 中 GROUP BY 的需要。喜欢：

SELECT c.vin, c.color, c.brand
     , a.min_appraisal
     , a.max_appraisal
FROM   car c
LEFT   JOIN (
   SELECT car_vin
        , min(price) AS min_appraisal
        , max(price) AS max_appraisal
   FROM   appraisal
   GROUP  BY car_vin
   ) a ON a.car_vin = c.vin;

参见：

Multiple array_agg() calls in a single query

当我按键聚合时，为什么不能从 `GROUP BY` 中排除依赖列？

Why can't I exclude dependent columns from `GROUP BY` when I aggregate by a key?

sql

postgresql

group-by

primary-key