SQL Window 函数的条件

SQL Condition on Window function

我想对我的数据库 (PostgreSQL v9.4.5) 做一个特殊的请求,但我没能做到。

为了简单起见,假设我有以下 table AvgTemperatures,代表不同城市的不同平均温度,并根据不同的时间长度计算(以月计算):

 id |   city    |  avg | months 
----+-----------+------+--------
  1 |  New-York |   20 |     3   <--- average temperate over the last 3 months
  2 |  New-York |   19 |     6   <--- average temperate over the last 6 months
  3 |  New-York |   15 |    12   <--- etc
  4 |  New-York |   15 |    24
  5 |    Boston |   13 |     3
  6 |    Boston |   18 |     8
  7 |    Boston |   17 |    12
  8 |    Boston |   16 |    15
  9 |   Chicago |   12 |     2
 10 |   Chicago |   14 |    12
 11 |     Miami |   28 |     1
 12 |     Miami |   25 |     4
 13 |     Miami |   21 |    12
 14 |     Miami |   22 |    15
 15 |     Miami |   20 |    24

现在,想象一下,我想要 select 与至少一个平均值超过 19 度的城市中的度量相关的所有行。在这种情况下,我想要:

 id |   city    |  avg | months 
----+-----------+------+--------
  1 |  New-York |   20 |     3  
  2 |  New-York |   19 |     6  
  3 |  New-York |   15 |    12  
  4 |  New-York |   15 |    24  
 11 |     Miami |   28 |     1  
 12 |     Miami |   25 |     4  
 13 |     Miami |   21 |    12  
 14 |     Miami |   22 |    15  
 15 |     Miami |   20 |    24  

我可以做类似的事情:

 SELECT *
 FROM AvgTemperatures
 WHERE MIN(avg) OVER (PARTITION BY city) > 16

但是:

********** Erreur **********

ERROR: window functions not allowed in WHERE clause

此外,我不能像 :

那样使用 GROUP BY
 SELECT *
 FROM AvtTemperatures
 GROUP BY city
 HAVING MIN(avg) > 16

因为我会因聚合而丢失信息(顺便说一下,由于 "SELECT *",此查询无效)。

我很确定我可以使用 OVER PARTITION BY 来解决这个问题,但我不知道如何解决。有人有想法吗?

All-at-once operation:

"All-at-Once Operations" means that all expressions in the same logical query process phase are evaluated logically at the same time.

很棒的一章对Window功能的影响

假设你有:

CREATE TABLE Test ( Id INT) ;
 
INSERT  INTO Test VALUES  ( 1001 ), ( 1002 ) ;

SELECT Id
FROM Test
WHERE Id = 1002
  AND ROW_NUMBER() OVER(ORDER BY Id) = 1;

All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.

案例一:

If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )

结果:1002

案例二:

If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )

结果:空

So we have a paradox.

This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!

要得到你想要的,你可以用 CTE/subquery 包装窗口函数,如 :

;WITH cte AS
(
  SELECT t.*, MAX(AVG) OVER (PARTITION BY city) AS average
  FROM avgTemperatures t
)
SELECT *
FROM cte
where average > 19
ORDER BY id;

db<>fiddle demo

输出:

╔═════╦══════════╦═════╦═════════╗
║ id  ║   city   ║ avg ║ months  ║
╠═════╬══════════╬═════╬═════════╣
║   1 ║ New-York ║  20 ║     3   ║
║   2 ║ New-York ║  19 ║     6   ║
║   3 ║ New-York ║  15 ║    12   ║
║   4 ║ New-York ║  15 ║    24   ║
║  11 ║ Miami    ║  28 ║     1   ║
║  12 ║ Miami    ║  25 ║     4   ║
║  13 ║ Miami    ║  21 ║    12   ║
║  14 ║ Miami    ║  22 ║    15   ║
║  15 ║ Miami    ║  20 ║    24   ║
╚═════╩══════════╩═════╩═════════╝

您需要将其包装在派生的 table 中以便能够在 where 子句中使用:

select *
from (
  SELECT t.*, MIN(avg) OVER (PARTITION BY city) as city_avg
  FROM AvgTemperatures t
) x
WHERE city_avg > 16

使用子查询得到最大值然后where:

select t.*
from (select t.*, max(avg) over (partition by city) as maxavg
      from avgTemperatures t
     ) t
where maxavg > 19;

另一种方法是在 where 子句中这样做:

select t.*
from avgTemperatures t
where t.city in (select t2.city from avgTemperatures t2 where t2.avg > 19);

最简单的解决方案是使用 bool_or aggregate function

select id, city, avg, months
from avttemperatures
where city in (
    select city
    from avttemperatures
    group by 1
    having bool_or(avg > 19)
)
order by  2, 4
;
 id |   city   | avg | months 
----+----------+-----+--------
 11 | Miami    |  28 |      1
 12 | Miami    |  25 |      4
 13 | Miami    |  21 |     12
 14 | Miami    |  22 |     15
 15 | Miami    |  20 |     24
  1 | New-York |  20 |      3
  2 | New-York |  19 |      6
  3 | New-York |  15 |     12
  4 | New-York |  15 |     24

测试table:

create table avttemperatures (
    id int, city text, avg int, months int
);
insert into avttemperatures (id, city, avg, months) values
(  1,'New-York',20,3),
(  2,'New-York',19,6),
(  3,'New-York',15,12),
(  4,'New-York',15,24),
(  5,'Boston',13,3),
(  6,'Boston',18,8),
(  7,'Boston',17,12),
(  8,'Boston',16,15),
(  9,'Chicago',12,2),
( 10,'Chicago',14,12),
( 11,'Miami',28,1),
( 12,'Miami',25,4),
( 13,'Miami',21,12),
( 14,'Miami',22,15),
( 15,'Miami',20,24);

如果您只想知道是否至少有一个存在,则无需汇总:

SELECT id, city, avg, months
FROM avgtemperatures t
WHERE EXISTS ( SELECT 42
    FROM avgtemperatures x
    WHERE x.city = t.city
    AND x.avg > 19
    )
ORDER BY city,months DESC
   ;

注意:avg 是列的错误名称。