SQL Window 函数的条件
SQL Condition on Window function
我想对我的数据库 (PostgreSQL v9.4.5) 做一个特殊的请求,但我没能做到。
为了简单起见,假设我有以下 table AvgTemperatures,代表不同城市的不同平均温度,并根据不同的时间长度计算(以月计算):
id | city | avg | months
----+-----------+------+--------
1 | New-York | 20 | 3 <--- average temperate over the last 3 months
2 | New-York | 19 | 6 <--- average temperate over the last 6 months
3 | New-York | 15 | 12 <--- etc
4 | New-York | 15 | 24
5 | Boston | 13 | 3
6 | Boston | 18 | 8
7 | Boston | 17 | 12
8 | Boston | 16 | 15
9 | Chicago | 12 | 2
10 | Chicago | 14 | 12
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
现在,想象一下,我想要 select 与至少一个平均值超过 19 度的城市中的度量相关的所有行。在这种情况下,我想要:
id | city | avg | months
----+-----------+------+--------
1 | New-York | 20 | 3
2 | New-York | 19 | 6
3 | New-York | 15 | 12
4 | New-York | 15 | 24
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
我可以做类似的事情:
SELECT *
FROM AvgTemperatures
WHERE MIN(avg) OVER (PARTITION BY city) > 16
但是:
********** Erreur **********
ERROR: window functions not allowed in WHERE clause
此外,我不能像 :
那样使用 GROUP BY
SELECT *
FROM AvtTemperatures
GROUP BY city
HAVING MIN(avg) > 16
因为我会因聚合而丢失信息(顺便说一下,由于 "SELECT *",此查询无效)。
我很确定我可以使用 OVER PARTITION BY
来解决这个问题,但我不知道如何解决。有人有想法吗?
"All-at-Once Operations" means that all expressions in the same
logical query process phase are evaluated logically at the same time.
很棒的一章对Window功能的影响:
假设你有:
CREATE TABLE Test ( Id INT) ;
INSERT INTO Test VALUES ( 1001 ), ( 1002 ) ;
SELECT Id
FROM Test
WHERE Id = 1002
AND ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can
evaluate conditions in WHERE clause in arbitrary order, based on
estimated execution plan. So the main question here is which condition
evaluates first.
案例一:
If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
结果:1002
案例二:
If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )
结果:空
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause.
You can think more about this and find why Window Functions are
allowed to be used just in SELECT and ORDER BY clauses!
要得到你想要的,你可以用 CTE/subquery
包装窗口函数,如 :
;WITH cte AS
(
SELECT t.*, MAX(AVG) OVER (PARTITION BY city) AS average
FROM avgTemperatures t
)
SELECT *
FROM cte
where average > 19
ORDER BY id;
输出:
╔═════╦══════════╦═════╦═════════╗
║ id ║ city ║ avg ║ months ║
╠═════╬══════════╬═════╬═════════╣
║ 1 ║ New-York ║ 20 ║ 3 ║
║ 2 ║ New-York ║ 19 ║ 6 ║
║ 3 ║ New-York ║ 15 ║ 12 ║
║ 4 ║ New-York ║ 15 ║ 24 ║
║ 11 ║ Miami ║ 28 ║ 1 ║
║ 12 ║ Miami ║ 25 ║ 4 ║
║ 13 ║ Miami ║ 21 ║ 12 ║
║ 14 ║ Miami ║ 22 ║ 15 ║
║ 15 ║ Miami ║ 20 ║ 24 ║
╚═════╩══════════╩═════╩═════════╝
您需要将其包装在派生的 table 中以便能够在 where 子句中使用:
select *
from (
SELECT t.*, MIN(avg) OVER (PARTITION BY city) as city_avg
FROM AvgTemperatures t
) x
WHERE city_avg > 16
使用子查询得到最大值然后where
:
select t.*
from (select t.*, max(avg) over (partition by city) as maxavg
from avgTemperatures t
) t
where maxavg > 19;
另一种方法是在 where
子句中这样做:
select t.*
from avgTemperatures t
where t.city in (select t2.city from avgTemperatures t2 where t2.avg > 19);
最简单的解决方案是使用 bool_or
aggregate function
select id, city, avg, months
from avttemperatures
where city in (
select city
from avttemperatures
group by 1
having bool_or(avg > 19)
)
order by 2, 4
;
id | city | avg | months
----+----------+-----+--------
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
1 | New-York | 20 | 3
2 | New-York | 19 | 6
3 | New-York | 15 | 12
4 | New-York | 15 | 24
测试table:
create table avttemperatures (
id int, city text, avg int, months int
);
insert into avttemperatures (id, city, avg, months) values
( 1,'New-York',20,3),
( 2,'New-York',19,6),
( 3,'New-York',15,12),
( 4,'New-York',15,24),
( 5,'Boston',13,3),
( 6,'Boston',18,8),
( 7,'Boston',17,12),
( 8,'Boston',16,15),
( 9,'Chicago',12,2),
( 10,'Chicago',14,12),
( 11,'Miami',28,1),
( 12,'Miami',25,4),
( 13,'Miami',21,12),
( 14,'Miami',22,15),
( 15,'Miami',20,24);
如果您只想知道是否至少有一个存在,则无需汇总:
SELECT id, city, avg, months
FROM avgtemperatures t
WHERE EXISTS ( SELECT 42
FROM avgtemperatures x
WHERE x.city = t.city
AND x.avg > 19
)
ORDER BY city,months DESC
;
注意:avg
是列的错误名称。
我想对我的数据库 (PostgreSQL v9.4.5) 做一个特殊的请求,但我没能做到。
为了简单起见,假设我有以下 table AvgTemperatures,代表不同城市的不同平均温度,并根据不同的时间长度计算(以月计算):
id | city | avg | months
----+-----------+------+--------
1 | New-York | 20 | 3 <--- average temperate over the last 3 months
2 | New-York | 19 | 6 <--- average temperate over the last 6 months
3 | New-York | 15 | 12 <--- etc
4 | New-York | 15 | 24
5 | Boston | 13 | 3
6 | Boston | 18 | 8
7 | Boston | 17 | 12
8 | Boston | 16 | 15
9 | Chicago | 12 | 2
10 | Chicago | 14 | 12
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
现在,想象一下,我想要 select 与至少一个平均值超过 19 度的城市中的度量相关的所有行。在这种情况下,我想要:
id | city | avg | months
----+-----------+------+--------
1 | New-York | 20 | 3
2 | New-York | 19 | 6
3 | New-York | 15 | 12
4 | New-York | 15 | 24
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
我可以做类似的事情:
SELECT *
FROM AvgTemperatures
WHERE MIN(avg) OVER (PARTITION BY city) > 16
但是:
********** Erreur **********
ERROR: window functions not allowed in WHERE clause
此外,我不能像 :
那样使用GROUP BY
SELECT *
FROM AvtTemperatures
GROUP BY city
HAVING MIN(avg) > 16
因为我会因聚合而丢失信息(顺便说一下,由于 "SELECT *",此查询无效)。
我很确定我可以使用 OVER PARTITION BY
来解决这个问题,但我不知道如何解决。有人有想法吗?
"All-at-Once Operations" means that all expressions in the same logical query process phase are evaluated logically at the same time.
很棒的一章对Window功能的影响:
假设你有:
CREATE TABLE Test ( Id INT) ;
INSERT INTO Test VALUES ( 1001 ), ( 1002 ) ;
SELECT Id
FROM Test
WHERE Id = 1002
AND ROW_NUMBER() OVER(ORDER BY Id) = 1;
All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.
案例一:
If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )
结果:1002
案例二:
If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )
结果:空
So we have a paradox.
This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!
要得到你想要的,你可以用 CTE/subquery
包装窗口函数,如
;WITH cte AS
(
SELECT t.*, MAX(AVG) OVER (PARTITION BY city) AS average
FROM avgTemperatures t
)
SELECT *
FROM cte
where average > 19
ORDER BY id;
输出:
╔═════╦══════════╦═════╦═════════╗
║ id ║ city ║ avg ║ months ║
╠═════╬══════════╬═════╬═════════╣
║ 1 ║ New-York ║ 20 ║ 3 ║
║ 2 ║ New-York ║ 19 ║ 6 ║
║ 3 ║ New-York ║ 15 ║ 12 ║
║ 4 ║ New-York ║ 15 ║ 24 ║
║ 11 ║ Miami ║ 28 ║ 1 ║
║ 12 ║ Miami ║ 25 ║ 4 ║
║ 13 ║ Miami ║ 21 ║ 12 ║
║ 14 ║ Miami ║ 22 ║ 15 ║
║ 15 ║ Miami ║ 20 ║ 24 ║
╚═════╩══════════╩═════╩═════════╝
您需要将其包装在派生的 table 中以便能够在 where 子句中使用:
select *
from (
SELECT t.*, MIN(avg) OVER (PARTITION BY city) as city_avg
FROM AvgTemperatures t
) x
WHERE city_avg > 16
使用子查询得到最大值然后where
:
select t.*
from (select t.*, max(avg) over (partition by city) as maxavg
from avgTemperatures t
) t
where maxavg > 19;
另一种方法是在 where
子句中这样做:
select t.*
from avgTemperatures t
where t.city in (select t2.city from avgTemperatures t2 where t2.avg > 19);
最简单的解决方案是使用 bool_or
aggregate function
select id, city, avg, months
from avttemperatures
where city in (
select city
from avttemperatures
group by 1
having bool_or(avg > 19)
)
order by 2, 4
;
id | city | avg | months
----+----------+-----+--------
11 | Miami | 28 | 1
12 | Miami | 25 | 4
13 | Miami | 21 | 12
14 | Miami | 22 | 15
15 | Miami | 20 | 24
1 | New-York | 20 | 3
2 | New-York | 19 | 6
3 | New-York | 15 | 12
4 | New-York | 15 | 24
测试table:
create table avttemperatures (
id int, city text, avg int, months int
);
insert into avttemperatures (id, city, avg, months) values
( 1,'New-York',20,3),
( 2,'New-York',19,6),
( 3,'New-York',15,12),
( 4,'New-York',15,24),
( 5,'Boston',13,3),
( 6,'Boston',18,8),
( 7,'Boston',17,12),
( 8,'Boston',16,15),
( 9,'Chicago',12,2),
( 10,'Chicago',14,12),
( 11,'Miami',28,1),
( 12,'Miami',25,4),
( 13,'Miami',21,12),
( 14,'Miami',22,15),
( 15,'Miami',20,24);
如果您只想知道是否至少有一个存在,则无需汇总:
SELECT id, city, avg, months
FROM avgtemperatures t
WHERE EXISTS ( SELECT 42
FROM avgtemperatures x
WHERE x.city = t.city
AND x.avg > 19
)
ORDER BY city,months DESC
;
注意:avg
是列的错误名称。