在 SQL 中执行 MAX(evaluation_expression, return_expression) 的方法
Way to do MAX(evaluation_expression, return_expression) in SQL
我发现自己在执行 MIN
或 MAX
语句时经常希望获得相邻行的值。例如在下面的语句中:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT MAX(age) FROM people;
# MAX(age)
20
MAX
函数等效于:MAX(eval_expression=age, return_expression=age)
,它始终具有相同的求值和 return 值(隐式)。但是,我想找到最大年龄的人的 name
。因此,概念语法将是:MAX(eval_expression=age, return_expression=name)
。这是我发现自己经常使用的一种模式,通常最终会像这样破解一些东西:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT name FROM people NATURAL JOIN (SELECT name, MAX(age) age FROM people) _;
# name
'Greg'
是否有通用的方法来完成我想要完成的 MAX(expr, return)
?
更新:提供一个需要聚合的例子:
with sales as (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
) select date, max(sales) from sales group by date
# date, max(sales)
2014-01-01, 105
2014-01-02, 84
以及如何得到等价于:MAX(expr=sales, return=product)
?类似于:
WITH sales AS (
select DATE '2014-01-01' as d, 100 as revenue, 'Fish' as product union
select DATE '2014-01-01' as d, 105 as revenue, 'Potatoes' as product union
select DATE '2014-01-02' as d, 84 as revenue, 'Salsa' as product
) SELECT d AS date, product FROM sales NATURAL JOIN (SELECT d, MAX(revenue) AS revenue FROM sales GROUP BY d) _;
# date, product
2014-01-01, Potatoes
2014-01-02, Salsa
除非我在这里遗漏了什么 -
使用 limit
和 order by
:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
)
SELECT name
FROM people
ORDER BY age DESC
LIMIT 1;
# name
'Greg'
一种解决方案是使用无限制的 window 函数,例如 FIRST_VALUE
,您可以在其中按销售额对日期分区进行排序。这是一个例子:
;WITH sales AS (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-01' as date, 103 as sales, 'Lettuce' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)
SELECT DISTINCT date, LAST_VALUE(product) OVER (
partition by date
order by sales
-- Default: https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
-- rows between unbounded preceding and current row
rows between unbounded preceding and unbounded following
) top_product
FROM sales group by date;
# date, top_product
'2014-01-01', 'Potatoes'
'2014-01-02', 'Salsa'
我认为 subselect
可能更容易阅读(至少对我而言),但这是另一种选择。您必须检查两者的性能,但我认为分析函数(没有 not-indexeable join)会快得多。
如果你想用first_value()
,我推荐:
select distinct date,
first_value(product) over(partition by date order by sales desc) top_product
from sales
这里不需要聚合,也不需要 window 函数中的帧规范。 window 函数从销售额最大的行开始遍历数据集,因此分区中的所有行都得到相同的 top_product
分配。然后distinct
每个分区只保留一行。
但基本上,这最终会成为每组最多 n 个问题,您希望每个日期的销售额最大的行。如果您希望该行上有多个列,则 first_value()
解决方案无法很好地扩展。一个典型的解决方案是在子查询中对记录进行排名,然后进行过滤。同样,不需要聚合,这只是过滤逻辑:
select *
from (
select s.*
row_number() over(partition by date order bys ales desc) rn
from sales
) t
where rn = 1
我发现自己在执行 MIN
或 MAX
语句时经常希望获得相邻行的值。例如在下面的语句中:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT MAX(age) FROM people;
# MAX(age)
20
MAX
函数等效于:MAX(eval_expression=age, return_expression=age)
,它始终具有相同的求值和 return 值(隐式)。但是,我想找到最大年龄的人的 name
。因此,概念语法将是:MAX(eval_expression=age, return_expression=name)
。这是我发现自己经常使用的一种模式,通常最终会像这样破解一些东西:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT name FROM people NATURAL JOIN (SELECT name, MAX(age) age FROM people) _;
# name
'Greg'
是否有通用的方法来完成我想要完成的 MAX(expr, return)
?
更新:提供一个需要聚合的例子:
with sales as (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
) select date, max(sales) from sales group by date
# date, max(sales)
2014-01-01, 105
2014-01-02, 84
以及如何得到等价于:MAX(expr=sales, return=product)
?类似于:
WITH sales AS (
select DATE '2014-01-01' as d, 100 as revenue, 'Fish' as product union
select DATE '2014-01-01' as d, 105 as revenue, 'Potatoes' as product union
select DATE '2014-01-02' as d, 84 as revenue, 'Salsa' as product
) SELECT d AS date, product FROM sales NATURAL JOIN (SELECT d, MAX(revenue) AS revenue FROM sales GROUP BY d) _;
# date, product
2014-01-01, Potatoes
2014-01-02, Salsa
除非我在这里遗漏了什么 -
使用 limit
和 order by
:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
)
SELECT name
FROM people
ORDER BY age DESC
LIMIT 1;
# name
'Greg'
一种解决方案是使用无限制的 window 函数,例如 FIRST_VALUE
,您可以在其中按销售额对日期分区进行排序。这是一个例子:
;WITH sales AS (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-01' as date, 103 as sales, 'Lettuce' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)
SELECT DISTINCT date, LAST_VALUE(product) OVER (
partition by date
order by sales
-- Default: https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
-- rows between unbounded preceding and current row
rows between unbounded preceding and unbounded following
) top_product
FROM sales group by date;
# date, top_product
'2014-01-01', 'Potatoes'
'2014-01-02', 'Salsa'
我认为 subselect
可能更容易阅读(至少对我而言),但这是另一种选择。您必须检查两者的性能,但我认为分析函数(没有 not-indexeable join)会快得多。
如果你想用first_value()
,我推荐:
select distinct date,
first_value(product) over(partition by date order by sales desc) top_product
from sales
这里不需要聚合,也不需要 window 函数中的帧规范。 window 函数从销售额最大的行开始遍历数据集,因此分区中的所有行都得到相同的 top_product
分配。然后distinct
每个分区只保留一行。
但基本上,这最终会成为每组最多 n 个问题,您希望每个日期的销售额最大的行。如果您希望该行上有多个列,则 first_value()
解决方案无法很好地扩展。一个典型的解决方案是在子查询中对记录进行排名,然后进行过滤。同样,不需要聚合,这只是过滤逻辑:
select *
from (
select s.*
row_number() over(partition by date order bys ales desc) rn
from sales
) t
where rn = 1