在 OVER 子句中使用 ORDER BY
Use ORDER BY in OVER clause
我是 T-SQL 和窗口函数的新手。
我不明白为什么以下两个查询会产生相同的结果:
SELECT
empid, ordermonth, val,
SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runval
FROM
Sales.EmpOrders;
和
SELECT
empid, ordermonth, val,
SUM(val) OVER(PARTITION BY empid ORDER BY ordermonth) AS runval
FROM
Sales.EmpOrders;
输出相同:
第二个查询不是应该为每个 empid 产生相同的总值吗?或者 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
是默认值并且在 over 子句中使用 order by 时是可选的?
如果您希望每个 empid
具有相同的值,则不要使用 ORDER BY
:
SELECT empid, ordermonth, val,
SUM(val) OVER (PARTITION BY empid) AS runval
FROM Sales.EmpOrders;
否则,您的两个表达式是相同的 -- 如果排序键是唯一的。默认值在 documentation:
中解释
If ROWS/RANGE is not specified but ORDER BY is specified, RANGE
UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window
frame.
对于 运行 总和(和类似),当两行之间的 ORDER BY ...
中存在平局时,差异是可见的。考虑这个例子,员工在 2006-09-01
:
有两个订单
DECLARE @T TABLE (empid INT, ordermonth DATE, val INT);
INSERT INTO @T VALUES
(1, '2006-07-01', 100),
(1, '2006-08-01', 100),
(1, '2006-09-01', 100),
(1, '2006-09-01', 100),
(1, '2006-10-01', 100);
SELECT empid, ordermonth, val,
runval_rows = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
runval_auto = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth)
FROM @t
empid | ordermonth | val | runval_rows | runval_auto
1 | 2006-07-01 | 100 | 100 | 100
1 | 2006-08-01 | 100 | 200 | 200
1 | 2006-09-01 | 100 | 300* | 400*
1 | 2006-09-01 | 100 | 400* | 400*
1 | 2006-10-01 | 100 | 500 | 500
如果未指定 row/range 子句,则 SQL 服务器默认为:
If ROWS/RANGE is not specified but ORDER BY is specified, RANGE
UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window
frame.
用最简单的话来说,SQL 服务器将范围* 定义为分区内的一组行,这些行在 ORDER BY
子句中指定的列中具有相同的值。因此,第二个变体将第 3 个和第 4 个视为同一范围的一部分,并在计算 运行 总和时将它们都包括在内。
* 请注意,此定义与 "standard" 定义不同,答案仅适用于 SQL 服务器。
我是 T-SQL 和窗口函数的新手。
我不明白为什么以下两个查询会产生相同的结果:
SELECT
empid, ordermonth, val,
SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runval
FROM
Sales.EmpOrders;
和
SELECT
empid, ordermonth, val,
SUM(val) OVER(PARTITION BY empid ORDER BY ordermonth) AS runval
FROM
Sales.EmpOrders;
输出相同:
第二个查询不是应该为每个 empid 产生相同的总值吗?或者 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
是默认值并且在 over 子句中使用 order by 时是可选的?
如果您希望每个 empid
具有相同的值,则不要使用 ORDER BY
:
SELECT empid, ordermonth, val,
SUM(val) OVER (PARTITION BY empid) AS runval
FROM Sales.EmpOrders;
否则,您的两个表达式是相同的 -- 如果排序键是唯一的。默认值在 documentation:
中解释If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame.
对于 运行 总和(和类似),当两行之间的 ORDER BY ...
中存在平局时,差异是可见的。考虑这个例子,员工在 2006-09-01
:
DECLARE @T TABLE (empid INT, ordermonth DATE, val INT);
INSERT INTO @T VALUES
(1, '2006-07-01', 100),
(1, '2006-08-01', 100),
(1, '2006-09-01', 100),
(1, '2006-09-01', 100),
(1, '2006-10-01', 100);
SELECT empid, ordermonth, val,
runval_rows = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
runval_auto = SUM(val) OVER (PARTITION BY empid ORDER BY ordermonth)
FROM @t
empid | ordermonth | val | runval_rows | runval_auto
1 | 2006-07-01 | 100 | 100 | 100
1 | 2006-08-01 | 100 | 200 | 200
1 | 2006-09-01 | 100 | 300* | 400*
1 | 2006-09-01 | 100 | 400* | 400*
1 | 2006-10-01 | 100 | 500 | 500
如果未指定 row/range 子句,则 SQL 服务器默认为:
If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame.
用最简单的话来说,SQL 服务器将范围* 定义为分区内的一组行,这些行在 ORDER BY
子句中指定的列中具有相同的值。因此,第二个变体将第 3 个和第 4 个视为同一范围的一部分,并在计算 运行 总和时将它们都包括在内。
* 请注意,此定义与 "standard" 定义不同,答案仅适用于 SQL 服务器。