在 SQL 服务器中一次性获得 DISTINCT COUNT

Question

我有一个 table 如下所示：

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      10
R1        C1         M1              B1       2017      20
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      5
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M1              B1       2017      35
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45
...

我写了下面的查询来聚合它们：

SELECT [Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,Period
    ,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,[Period]
ORDER BY 1,2,3,4

产生如下内容：

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      30 -- this row is an aggregate from raw table above
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      4  -- aggregated result
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45

我想在上面的 table 中添加另一列，显示按 Region、Country、[分组的 Brand 的 DISTINCT COUNT =19=] 和 Period。所以最后的 table 会变成如下：

Region    Country    Manufacturer    Brand    Period    Spend    UniqBrandCount
R1        C1         M1              B1       2016      5        2 -- two brands by R1, C1, M1 in 2016
R1        C1         M1              B1       2017      30       1
R1        C1         M1              B2       2016      15       2 -- same as first row's result
R1        C1         M1              B3       2017      20       1
R1        C2         M1              B1       2017      4        1
R1        C2         M2              B4       2017      25       2
R1        C2         M2              B5       2017      30       2
R2        C3         M2              B4       2017      40       2
R2        C3         M2              B5       2017      45       2

我知道如何分三步得出最终结果。

运行此查询（查询 #1）：

SELECT [地区] ，[国家] ，[制造商] ，[时期] ,COUNT(DISTINCT [Brand]) AS [BrandCount] 进入温度 1 从我的表按 [地区] 分组，[国家] ，[制造商] ,[句号]
运行这个查询（查询#2）

SELECT [地区] ，[国家] ，[制造商] ，[品牌] ,YEAR([期间]) 作为期间 ,SUM([支出]) AS [支出] 进入温度 2 从我的表按 [地区] 分组，[国家] ，[制造商] ，[品牌] ,[句点]
然后 LEFT JOIN Temp2 和 Temp1 从后者引入 [BrandCount] 如下所示：

SELECT a.* ,b.* 从 Temp2 作为 LEFT JOIN Temp1 AS b ON a.[Region] = b.[Region] AND a.[国家] = b.[国家] AND a.[广告商] = b.[广告商] AND a.[Period] = b.[Period]

我很确定有更有效的方法来做到这一点，是吗？预先感谢您的 suggestions/answers！

Answer 1

您问题的标签；

window-functions

表明你有一个很好的主意。

对于 按地区、国家、制造商和期间分组的品牌的 DISTINCT COUNT：您可以写：

Select   Region 
        ,Country
        ,Manufacturer
        ,Brand
        ,Period
        ,Spend
        ,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc) 
         + DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc) 
         -1 UniqBrandCount
From myTable T1
Order By 1,2,3,4

Answer 2

大量借鉴这个问题：https://dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over

Count Distinct 不起作用，因此需要 dense_rank。对品牌进行正序排列和倒序排列，然后减去 1 得到不同的计数。

您的 sum 函数也可以使用 PARTITION BY 逻辑重写。这样您就可以为每个聚合使用不同的分组级别：

SELECT 
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER 
    (PARTITION BY 
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Period] Order by Brand) 
+ dense_rank() OVER 
    (PARTITION BY 
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Period] Order by Brand Desc) 
- 1  
AS [BrandCount]
,SUM([Spend]) OVER
    (PARTITION BY
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4

然后您可能需要减少输出中的行数，因为此语法提供与 myTable 相同的行数，但聚合总计出现在它们适用的每一行上：

R1  C1  M1  B1  2016    2   5
R1  C1  M1  B1  2017    2   30 --dup1
R1  C1  M1  B1  2017    2   30 --dup1
R1  C1  M1  B2  2016    2   15
R1  C1  M1  B3  2017    2   20
R1  C2  M1  B1  2017    1   5
R1  C2  M2  B4  2017    2   25
R1  C2  M2  B5  2017    2   30
R2  C3  M1  B1  2017    1   35
R2  C3  M2  B4  2017    2   40
R2  C3  M2  B5  2017    2   45

从此输出中选择不同的行即可满足您的需求。

dense_rank 技巧的工作原理

考虑这个数据：

Col1    Col2
B       1
B       1
B       3
B       5
B       7
B       9

dense_rank() 根据当前项之前的不同项的数量加 1 对数据进行排名。因此：

1->1, 3->2, 5->3, 7->4, 9->5.

以相反的顺序（使用 desc）这会产生相反的模式：

1->5, 3->4, 5->3, 7->2, 9->1:

将这些排名相加得到相同的值：

1+5 = 2+4 = 3+3 = 4+2 = 5+1 = 6

这里的措辞很有帮助，

(number of distinct items before + 1) + (number of distinct items after + 1) 
= number of distinct OTHER items before AND after + 2 
= Total number of distinct items + 1

因此，要获得不同项目的总数，请将 ascending 和 descending dense_rank 加在一起并减去 1。

Answer 3

双 dense_rank 想法意味着您需要两种排序（假设不存在提供排序顺序的索引）。假设没有 NULL 品牌（就像那个想法一样），您可以使用单个 dense_rank 和窗口 MAX，如下所示 (demo)

WITH T1
     AS (SELECT *,
                DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
         FROM   myTable),
     T2
     AS (SELECT *,
                MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
         FROM   T1)
SELECT [Region],
       [Country],
       [Manufacturer],
       [Brand],
       Period,
       SUM([Spend])        AS [Spend],
       MAX(UniqBrandCount) AS UniqBrandCount
FROM   T2
GROUP  BY [Region],
          [Country],
          [Manufacturer],
          [Brand],
          [Period]
ORDER  BY [Region],
          [Country],
          [Manufacturer],
          [Period],
          Brand

上面有一些不可避免的假脱机（不可能以 100% 的流式处理方式做到这一点）但是单一排序。

奇怪的是，需要最终的 order by 子句才能将排序数保持为 1（如果存在合适的索引，则为 0）。

在 SQL 服务器中一次性获得 DISTINCT COUNT

Get DISTINCT COUNT in one pass in SQL Server

sql

sql-server

window-functions

dense_rank 技巧的工作原理