在 SQL 中执行 MAX(evaluation_expression, return_expression) 的方法

Way to do MAX(evaluation_expression, return_expression) in SQL

我发现自己在执行 MINMAX 语句时经常希望获得相邻行的值。例如在下面的语句中:

WITH people AS (
    select 'Greg' as name, 20 as age union
    select 'Tom' as name, 17 as age
) SELECT MAX(age) FROM people;

# MAX(age)
20

MAX 函数等效于:MAX(eval_expression=age, return_expression=age),它始终具有相同的求值和 return 值(隐式)。但是,我想找到最大年龄的人的 name。因此,概念语法将是:MAX(eval_expression=age, return_expression=name)。这是我发现自己经常使用的一种模式,通常最终会像这样破解一些东西:

WITH people AS (
    select 'Greg' as name, 20 as age union
    select 'Tom' as name, 17 as age
) SELECT name FROM people NATURAL JOIN (SELECT name, MAX(age) age FROM people) _;

# name
'Greg'

是否有通用的方法来完成我想要完成的 MAX(expr, return)


更新:提供一个需要聚合的例子:

with sales as (
    select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
    select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
    select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)  select date, max(sales) from sales group by date

# date, max(sales)
2014-01-01, 105
2014-01-02, 84

以及如何得到等价于:MAX(expr=sales, return=product)?类似于:

WITH sales AS (
    select DATE '2014-01-01' as d, 100 as revenue, 'Fish' as product union
    select DATE '2014-01-01' as d, 105 as revenue, 'Potatoes' as product union
    select DATE '2014-01-02' as d, 84 as revenue, 'Salsa' as product
) SELECT d AS date, product FROM sales NATURAL JOIN (SELECT d, MAX(revenue) AS revenue FROM sales GROUP BY d) _;

# date, product
2014-01-01, Potatoes
2014-01-02, Salsa

除非我在这里遗漏了什么 - 使用 limitorder by:

WITH people AS (
    select 'Greg' as name, 20 as age union
    select 'Tom' as name, 17 as age
)
SELECT name
FROM people 
ORDER BY age DESC
LIMIT 1;

# name
'Greg'

一种解决方案是使用无限制的 window 函数,例如 FIRST_VALUE,您可以在其中按销售额对日期分区进行排序。这是一个例子:

;WITH sales AS (
    select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
    select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
    select DATE '2014-01-01' as date, 103 as sales, 'Lettuce' as product union
    select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)  

SELECT DISTINCT date, LAST_VALUE(product) OVER (
    partition by date
    order by sales
    -- Default: https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
    -- rows between unbounded preceding and current row
    rows between unbounded preceding and unbounded following
) top_product
 FROM sales group by date;

# date, top_product
'2014-01-01', 'Potatoes'
'2014-01-02', 'Salsa'

我认为 subselect 可能更容易阅读(至少对我而言),但这是另一种选择。您必须检查两者的性能,但我认为分析函数(没有 not-indexeable join)会快得多。

如果你想用first_value(),我推荐:

select distinct date, 
    first_value(product) over(partition by date order by sales desc) top_product
from sales

这里不需要聚合,也不需要 window 函数中的帧规范。 window 函数从销售额最大的行开始遍历数据集,因此分区中的所有行都得到相同的 top_product 分配。然后distinct每个分区只保留一行。

但基本上,这最终会成为每组最多 n 个问题,您希望每个日期的销售额最大的行。如果您希望该行上有多个列,则 first_value() 解决方案无法很好地扩展。一个典型的解决方案是在子查询中对记录进行排名,然后进行过滤。同样,不需要聚合,这只是过滤逻辑:

select *
from (
    select s.*
        row_number() over(partition by date order bys ales desc) rn
    from sales
) t
where rn = 1