Window 计算最近 10 分钟内发生次数的函数

Question

我可以使用传统的子查询方法来计算最近十分钟内出现的次数。例如，这个：

drop table if exists [dbo].[readings]
go

create table [dbo].[readings](
    [server] [int] NOT NULL,
    [sampled] [datetime] NOT NULL
)
go

insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go

如何使用 window 函数获得相同的结果？我试过这个：

select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled

但结果只是运行计数。引用当前行指针的任何系统变量？ currentrow.sampled ?

Answer 1

据我所知，没有使用 window 函数来简单精确地替换您的子查询。

Window 函数对一组行进行操作，并允许您根据分区和顺序使用它们。您尝试做的不是我们可以在 window 函数中使用的分区类型。要生成分区，我们需要能够在这种情况下使用 window 函数，这只会导致代码过于复杂。

我建议 cross apply() 作为您的子查询的替代方法。

我不确定您是否打算将结果限制在 9 分钟以内，但是 sampled > dateadd(...) 这就是您原始子查询中发生的情况。

根据将样本划分为 10 分钟 windows 以及 cross apply() 版本，window 函数可能看起来像这样。

select 
    r.server
  , r.sampled
  , CrossApply       = x.CountRecent
  , OriginalSubquery = (
      select count(*) 
      from readings s
      where s.server=r.server
        and s.sampled <= r.sampled
        /* doesn't include 10 minutes ago */
        and s.sampled > dateadd(minute,-10,r.sampled)
        )
  , Slices           = count(*) over(
      /* partition by server, 10 minute slices, not the same thing*/
      partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
      order by sampled
      )
from readings r
  cross apply (
    select CountRecent=count(*) 
    from readings i
    where i.server=r.server
      /* changed to >= */
      and i.sampled >= dateadd(minute,-10,r.sampled) 
      and i.sampled <= r.sampled 
     ) as x
order by server,sampled

结果：http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+
| server |       sampled       | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
|      1 | 01.01.2017 08:00:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:02:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:05:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:30:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:31:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:37:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:40:00 |          4 |                3 |      1 |
|      1 | 01.01.2017 08:41:00 |          4 |                3 |      2 |
|      1 | 01.01.2017 09:07:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 09:08:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 09:09:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 09:11:00 |          4 |                4 |      1 |
+--------+---------------------+------------+------------------+--------+

Answer 2

这不是一个非常令人满意的答案，但一种可能性是先创建一个助手 table 用所有的时间

CREATE TABLE #DateTimes(datetime datetime primary key);

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
                            (1),(1),(1),(1),(1)) V(N)
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
 ,R(StartRange, EndRange)
 AS (SELECT MIN(sampled),
            MAX(sampled)
     FROM   readings)
 ,N(N)
 AS (SELECT ROW_NUMBER()
              OVER (
                ORDER BY (SELECT NULL)) AS N
     FROM   E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM   N,
       R;

然后你可以使用 ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

WITH T1 AS
( SELECT  Server,
                  MIN(sampled) AS StartRange,
                  MAX(sampled) AS EndRange
         FROM     readings
         GROUP BY Server )
SELECT      Server,
            sampled,
            Cnt
FROM        T1
CROSS APPLY
            ( SELECT   r.sampled,
                                COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
                      FROM      #DateTimes N
                      LEFT JOIN readings r
                      ON        r.sampled = N.datetime
                                AND r.server = T1.server
                      WHERE     N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE       CA.sampled IS NOT NULL
ORDER BY    sampled

以上假定每分钟最多有一个样本，并且所有时间都是精确的分钟。如果这不是真的，则需要另一个 table 表达式 pre-aggregating 按四舍五入到分钟的日期时间。

Answer 3

谢谢 Martin 和 SqlZim 的回答。我将针对可用于 window 聚合的 %%currentrow 之类的内容提出连接增强请求。我认为这会导致更加简单和自然 sql:

select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over ( ...无论 window 是什么...)

我们已经可以使用这样的表达式了：

select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...无论 window 是...)

因此，如果我们可以引用当前行中的列，那么思考会很棒。

Window 计算最近 10 分钟内发生次数的函数

Window function to count occurrences in last 10 minutes

tsql

sql-server

window-functions