在 OVER 子句中使用 ORDER BY

Use ORDER BY in OVER clause

我是 T-SQL 和窗口函数的新手。

我不明白为什么以下两个查询会产生相同的结果:

SELECT 
    empid, ordermonth, val,
   SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runval
FROM 
    Sales.EmpOrders;

SELECT 
    empid, ordermonth, val,
    SUM(val) OVER(PARTITION BY empid ORDER BY ordermonth) AS runval
FROM 
    Sales.EmpOrders;

输出相同:

第二个查询不是应该为每个 empid 产生相同的总值吗?或者 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是默认值并且在 over 子句中使用 order by 时是可选的?

如果您希望每个 empid 具有相同的值,则不要使用 ORDER BY:

SELECT empid, ordermonth, val,
       SUM(val) OVER (PARTITION BY empid) AS runval
FROM Sales.EmpOrders;

否则,您的两个表达式是相同的 -- 如果排序键是唯一的。默认值在 documentation:

中解释

If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame.

对于 运行 总和(和类似),当两行之间的 ORDER BY ... 中存在平局时,差异是可见的。考虑这个例子,员工在 2006-09-01:

有两个订单
DECLARE @T TABLE (empid INT, ordermonth DATE, val INT);
INSERT INTO @T VALUES
(1, '2006-07-01', 100),
(1, '2006-08-01', 100),
(1, '2006-09-01', 100),
(1, '2006-09-01', 100),
(1, '2006-10-01', 100);

SELECT empid, ordermonth, val,
   runval_rows = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
   runval_auto = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth)
FROM @t

empid | ordermonth | val | runval_rows | runval_auto
1     | 2006-07-01 | 100 | 100         | 100
1     | 2006-08-01 | 100 | 200         | 200
1     | 2006-09-01 | 100 | 300*        | 400*
1     | 2006-09-01 | 100 | 400*        | 400*
1     | 2006-10-01 | 100 | 500         | 500

如果未指定 row/range 子句,则 SQL 服务器默认为:

If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame.

用最简单的话来说,SQL 服务器将范围* 定义为分区内的一组行,这些行在 ORDER BY 子句中指定的列中具有相同的值。因此,第二个变体将第 3 个和第 4 个视为同一范围的一部分,并在计算 运行 总和时将它们都包括在内。

* 请注意,此定义与 "standard" 定义不同,答案仅适用于 SQL 服务器。