MSSQL2012 (Over/Partition) 最近 12 个月,包括空值?

MSSQL2012 (Over/Partition) Last 12 months, nulls included?

所以,我遇到了一个问题:在有产品的情况下,获取过去 12 个月(包括正在处理的月份)的实际 sum/avg,通过产品代码进行分区一个月内可能不存在。

尝试从 TL;DR 开始,让我们开始实际操作:

我试过:

SELECT * FROM (
  SELECT year, month, product,
  AVG(value) OVER (
    PARTITION BY product 
    ORDER BY year, month
    ROWS 11 PRECEDING
  ) as average,
  SUM(value) OVER (
    PARTITION BY product 
    ORDER BY year, month
    ROWS 11 PRECEDING
  ) as sum
  FROM suchDB.muchUSER.awesomeTABLE
) q
where year = <insert year> and month = <month>

问题:

我们曾经使用 'GROUP BY product''WHERE ((year-1)*100)+month > queriedYear-1Month AND (year*100)+month <= queriedYearMonth' 来查询这个东西,直到有人将我们指向 OVER/PARTITION 方向并且我们改变了一切......但即便如此我们还是遇到了一些问题, like avg() 函数会忽略 NULL 个月...

帮忙?

忘记了一件非常重要的事情

awesomeTABLE 中的数据是版本化的——每个 year/month 可以有多个版本。只能使用最新的。我通常通过加入 select distinct year, month, max(version) from awesomeTABLE group by year, month 来做到这一点,但它似乎扼杀了一些可能的解决方案...

如果您有 table 个产品和 table 个日期,我会将它们与上面的查询连接起来,以便表示所有产品和所有日期,然后求和用你的方法得到的结果。

您的第一个项目符号应该得到解决,因为所有日期和所有产品都将被表示

我相信你的第二个项目符号是由于 "rows 11 preceding" 真正得到过去的 11 个值而不管日期;这可以通过将 0 替换为 null 来解决当前缺失的 month/product 组合,但在左连接方法的情况下会提供。

对缺失的月份进行了大量可能的修复。这是一种不会过多改变您的原始查询的方法:

WITH ym as (
    select
         year(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as y,
        month(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as m
    from (values
         (0), (-1), (-2), (-3),  (-4),  (-5),
        (-6), (-7), (-8), (-9), (-10), (-11)
    ) ofs(n)
)
SELECT
    y, m, p.product,
    AVG(coalesce(value, 0)) OVER (
      PARTITION BY p.product 
      ORDER BY y, m
      ROWS 11 PRECEDING
    ) as average,
    SUM(value) OVER (
      PARTITION BY p.product 
      ORDER BY y, m
      ROWS 11 PRECEDING
    ) as sum
FROM
    ym cross join
    (select distinct product from suchDB.muchUSER.awesomeTABLE) p
    left outer join suchDB.muchUSER.awesomeTABLE t
        on t."year" = ym.y and t."month" = ym.m and t.product = p.product
where y = <insert year> and m = <month>

你可以做一个 order by y, m descselect top 1 而不是最后的 where。我通常会尽量避免使用 top 但我不确定您是如何删除参数的,并且避免使用某些编程库两次引用它们可能不会那么麻烦。即使您是手动执行此操作,您仍然必须记住在长查询中查看两个位置。

由于您似乎只需要一个月的聚合,我认为您可以使用这个不使用 window 函数的更简单版本:

WITH ym as (
    select
        dateadd(month, n, cast('<year>' + '<month>' + '01' as date) as dt,
         year(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as y,
        month(dateadd(month, n, cast('<year>' + '<month>' + '01' as date))) as m
    from (values
         (0), (-1), (-2), (-3),  (-4),  (-5),
        (-6), (-7), (-8), (-9), (-10), (-11)
    ) ofs(n)
)
SELECT
    year(max(dt)) as "year", month(max(dt)) as "month", p.product,
    AVG(coalesce(value, 0)) as average,
    SUM(value) as "sum"
FROM
    ym cross join
    (select distinct product from awesome) p
    left outer join awesome t
    on t."year" = ym.y and t."month" = ym.m and t.product = p.product
GROUP BY p.product

我不知道你是否只想限制在上一年实际销售的产品,所以我没有在那里处理。

如果您开始考虑如何使它更通用和可重用,您最终可能会更像这个。我继续将产品限制添加到 activity 的前一年:

WITH dt as (
    select dateadd(month, n, cast('<year>' + '<month>' + '01' as date) as dt
),
ym as (
    select dt, year(dateadd(month, n, dt)) as y, month(dateadd(month, n, dt)) as m
    from (values
         (0), (-1), (-2), (-3),  (-4),  (-5),
        (-6), (-7), (-8), (-9), (-10), (-11)
    ) ofs(n), dt
)
SELECT
    year(max(dt)) as "year", month(max(dt)) as "month", p.product,
    AVG(coalesce(value, 0)) as average, SUM(value) as "sum"
FROM
    ym cross join
    (
        select distinct product from awesome
        where cast("year" + "month" + '01' as date) between
                (select min(dt) from ym) and (select max(dt) from ym)
    ) p
    left outer join (
        select distinct /* get the latest "version" only */
            first_value("year")
                over (partition by "year", "month", product order by version desc),
            first_value("month")
                over (partition by "year", "month", product order by version desc),
            product,
            first_value(value)
                over (partition by "year", "month", product order by version desc)
        from awesome
    ) t
        on t."year" = ym.y and t."month" = ym.m and t.product = p.product
GROUP BY p.product

最终查询还尝试仅针对最新版本处理您的过滤器。不过,您需要更高版本的 SQL 服务器才能实现 first_value() 功能。

最大的问题是您需要获得两个列表才能正确汇总数据集中的值 - 一个日期列表和一个产品列表。如果没有这两个列表,最后一个月缺少产品意味着不会报告该产品,或者(正如您已经发现的那样)可能会汇总错误的 12 个月(缺少 7 月意味着前面 11 行包括开始月份)。

下面是对生成这些列表的过程的全面扩展探索。它仅使用源数据 table(假设每个月售出 东西 。它可以变得更简洁(即计算日期,如上面 shawnt 的示例) , 而是为了展示所有的步骤和假设而写的。我把它封装成一个存储过程,因为它显式地展示了传入的值。

    CREATE PROCEDURE DoTheThing 
    @startDate DATE  -- Should be Year-Month-01 or YearMonth01
    AS
    BEGIN
    DECLARE @yr INT, @mth INT,
        @yr2 INT, @mth2 INT,
        @endDate DATE   -- StartDt - 11 months

    -- if the date will be passed in with a day other than 01, add code here to set the day on the passed date to 01
    -- if only the high year and month are passed in, then create a @startDt value and continue.

    SET @endDate = DATEADD(MONTH, -11, @startDate)

    SELECT @yr = DATEPART(YEAR, @startDate),
        @mth = DATEPART(MONTH, @startdate),
        @yr2 = DATEPART(YEAR, @startDate),
        @mth2 = DATEPART(MONTH, @startdate)

    WITH mthYr AS (
        SELECT DISTINCT 
            YEAR, 
            MONTH
        FROM suchDB.muchUSER.awesomeTABLE   -- Get the data from the source table
        WHERE (
            YEAR = @yr              -- if in the passed-in year, then take all months less than or equal to the start month
            AND MONTH <= @mth
            )
            OR (
            YEAR = @yr2             -- if the period is Jan -- Dec in one year, this reiterates the above
            AND MONTH >= @mth2      -- if not, select the months in the second year where the month is greater than or equal to the calculated month
            )
        ), 
    prods AS (
        SELECT DISTINCT product     -- Return a list of products sold during the year.
        FROM suchDB.muchUSER.awesomeTABLE smt
        INNER JOIN mthYr
            ON mthYr.YEAR = smt.YEAR
            AND mthyr.MONTH = smt.MONTH
        )

    SELECT MAX(mthYr.YEAR) AS year,     -- current report only shows passed in year/month value
        MAX(mthYr.month) AS month, 
        prods.product,                  
        AVG(ISNULL(VALUE, 0.00)) average,   -- isNull value adds a zero into the list to be averaged
        SUM(ISNULL(VALUE, 0.00)) SUM        -- not really necessary, but no warnings about NULL values will be generated
    FROM mthYr CROSS JOIN prods         -- cross join of the two lists means all of the products sold will have a value for each month
    LEFT JOIN suchDB.muchUSER.awesomeTABLE smt  -- left join so missing productMonths will still be added in
        ON smt.YEAR = mthYr.year
        AND smt.MONTH = mthYr.month 
        AND prods.product = smt.product
    ORDER BY smt.product
END

处理日期时首先要做的是有一个日期时间字段,而不是年月日的分隔字段

SELECT year, month, product, value
     , DATEFROMPARTS(year, month, 1) fullDate
FROM   suchDB.muchUSER.awesomeTABLE

由于可以对行进行版本控制,我们需要获取每年、每月和产品的最新版本,这可以通过多种方式完成,例如使用窗口功能或自动连接,例如后者是

SELECT b.[year], b.[month], b.[product], [value]
     , DATEFROMPARTS(b.[year], b.month, 1) fullDate
FROM   suchDB.muchUSER.awesomeTABLE b
       INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                   FROM   suchDB.muchUSER.awesomeTABLE
                   GROUP BY [year], [month], [product]
                  ) m ON b.[year] = m.year AND b.month = m.month 
                     AND b.product = m.product AND b.[version] = m.lv

还有产品清单会派上用场

SELECT DISTINCT product
FROM   suchDB.muchUSER.awesomeTABLE

获取去年有价值的数据可以通过多种方式完成,我喜欢 CROSS APPLY

Declare @_year int
Declare @_month int

Set @_year = 2015
Set @_month = 11

;With baseDate AS (
  SELECT b.[year], b.[month], b.[product], [value]
       , DATEFROMPARTS(b.[year], b.month, 1) fullDate
  FROM   suchDB.muchUSER.awesomeTABLE b
         INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                     FROM   suchDB.muchUSER.awesomeTABLE
                     GROUP BY [year], [month], [product]
                    ) m ON b.[year] = m.year AND b.month = m.month 
                       AND b.product = m.product AND b.[version] = m.lv
), Products AS (
  SELECT DISTINCT [product]
  FROM   suchDB.muchUSER.awesomeTABLE
)
SELECT @_year [Year], @_month [Month], p.[product]
     , ly.Average
     , ly.[Sum]
FROM   Products p
       CROSS APPLY (SELECT Sum(lastYear.Value) / 12.0 Average
                         , Sum(lastYear.Value) [Sum]
                    FROM   baseDate lastYear
                    WHERE  lastYear.fullDate > DATEFROMPARTS(@_year - 1, @_month, 1)
                      AND  lastYear.fullDate <= DATEFROMPARTS(@_year, @_month, 1)
                      AND  lastYear.product = p.product
                   ) ly 
WHERE  ly.[Sum] IS NOT NULL

WHERE条件去掉参数前一年没有行的产品

要删除变量并获取每个月的值,需要一个日历 table。如果 table 中所有产品的所有月份都在那里,我们可以使用 DISTINCT 获取日期,使用第一个 CTE 而不是 table 让我们也得到完整日期

SELECT DISTINCT [year], [month], fullDate
FROM   baseDate

否则有不同的方法来创建日历 table。我们可以将日历 table 添加到主查询 CTEs 并在 CROSS APPLY 中使用它而不是变量

;With baseDate AS (
  SELECT b.[year] _y, b.[month] _m, b.[product], [value]
       , DATEFROMPARTS(b.[year], b.month, 1) fullDate
  FROM   suchDB.muchUSER.awesomeTABLE b
         INNER JOIN (SELECT [year], [month], [product], max([version]) lv
                     FROM   suchDB.muchUSER.awesomeTABLE
                     GROUP BY [year], [month], [product]
                    ) m ON b.[year] = m.year AND b.month = m.month 
                       AND b.product = m.product AND b.[version] = m.lv
), Products AS (
  SELECT DISTINCT [product]
  FROM   suchDB.muchUSER.awesomeTABLE
), Months As (
  SELECT DISTINCT _y, _m, fullDate
  FROM   baseDate
)
SELECT _y [Year], _m [Month], p.[product]
     , ly.Average
     , ly.[Sum]
     , ly.[Count]
FROM   Products p
       CROSS APPLY (SELECT m._y, m._m
                         , Sum(bd.Value) / 12.0 Average
                         , Sum(bd.Value) [Sum]
                         , Count(Value) [Count]
                    FROM   Months m
                           LEFT JOIN baseDate bd 
                                  ON bd.fullDate > DATEADD(YY, -1, m.fullDate)
                                 AND bd.fullDate <= m.fullDate
                    WHERE  bd.product = p.product
                    GROUP BY m._y, m._m
                   ) ly 
WHERE  ly.[Sum] IS NOT NULL