在 SQL 服务器中一次性获得 DISTINCT COUNT

Get DISTINCT COUNT in one pass in SQL Server

我有一个 table 如下所示:

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      10
R1        C1         M1              B1       2017      20
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      5
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M1              B1       2017      35
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45
...

我写了下面的查询来聚合它们:

SELECT [Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,Period
    ,SUM([Spend]) AS [Spend]
FROM myTable
GROUP BY [Region]
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,[Period]
ORDER BY 1,2,3,4

产生如下内容:

Region    Country    Manufacturer    Brand    Period    Spend
R1        C1         M1              B1       2016      5
R1        C1         M1              B1       2017      30 -- this row is an aggregate from raw table above
R1        C1         M1              B2       2016      15
R1        C1         M1              B3       2017      20
R1        C2         M1              B1       2017      4  -- aggregated result
R1        C2         M2              B4       2017      25
R1        C2         M2              B5       2017      30
R2        C3         M2              B4       2017      40
R2        C3         M2              B5       2017      45

我想在上面的 table 中添加另一列,显示按 RegionCountry、[分组的 BrandDISTINCT COUNT =19=] 和 Period。所以最后的 table 会变成如下:

Region    Country    Manufacturer    Brand    Period    Spend    UniqBrandCount
R1        C1         M1              B1       2016      5        2 -- two brands by R1, C1, M1 in 2016
R1        C1         M1              B1       2017      30       1
R1        C1         M1              B2       2016      15       2 -- same as first row's result
R1        C1         M1              B3       2017      20       1
R1        C2         M1              B1       2017      4        1
R1        C2         M2              B4       2017      25       2
R1        C2         M2              B5       2017      30       2
R2        C3         M2              B4       2017      40       2
R2        C3         M2              B5       2017      45       2

我知道如何分三步得出最终结果。

  1. 运行 此查询(查询 #1):

    SELECT [地区] ,[国家] ,[制造商] ,[时期] ,COUNT(DISTINCT [Brand]) AS [BrandCount] 进入温度 1 从我的表 按 [地区] 分组 ,[国家] ,[制造商] ,[句号]

  2. 运行 这个查询(查询#2)

    SELECT [地区] ,[国家] ,[制造商] ,[品牌] ,YEAR([期间]) 作为期间 ,SUM([支出]) AS [支出] 进入温度 2 从我的表 按 [地区] 分组 ,[国家] ,[制造商] ,[品牌] ,[句点]

  3. 然后 LEFT JOIN Temp2Temp1 从后者引入 [BrandCount] 如下所示:

    SELECT a.* ,b.* 从 Temp2 作为 LEFT JOIN Temp1 AS b ON a.[Region] = b.[Region] AND a.[国家] = b.[国家] AND a.[广告商] = b.[广告商] AND a.[Period] = b.[Period]

我很确定有更有效的方法来做到这一点,是吗?预先感谢您的 suggestions/answers!

您问题的标签;

window-functions

表明你有一个很好的主意。

对于 按地区、国家、制造商和期间分组的品牌的 DISTINCT COUNT:您可以写:

Select   Region 
        ,Country
        ,Manufacturer
        ,Brand
        ,Period
        ,Spend
        ,DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand asc) 
         + DENSE_RANK() Over (Partition By Region, Country, Manufacturer, Period Order By Brand desc) 
         -1 UniqBrandCount
From myTable T1
Order By 1,2,3,4

大量借鉴这个问题:https://dba.stackexchange.com/questions/89031/using-distinct-in-window-function-with-over

Count Distinct 不起作用,因此需要 dense_rank。对品牌进行正序排列和倒序排列,然后减去 1 得到不同的计数。

您的 sum 函数也可以使用 PARTITION BY 逻辑重写。这样您就可以为每个聚合使用不同的分组级别:

SELECT 
[Region]
,[Country]
,[Manufacturer]
,[Brand]
,[Period]
,dense_rank() OVER 
    (PARTITION BY 
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Period] Order by Brand) 
+ dense_rank() OVER 
    (PARTITION BY 
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Period] Order by Brand Desc) 
- 1  
AS [BrandCount]
,SUM([Spend]) OVER
    (PARTITION BY
     [Region] 
    ,[Country]
    ,[Manufacturer]
    ,[Brand]
    ,[Period]) as [Spend]
from
myTable
ORDER BY 1,2,3,4

然后您可能需要减少输出中的行数,因为此语法提供与 myTable 相同的行数,但聚合总计出现在它们适用的每一行上:

R1  C1  M1  B1  2016    2   5
R1  C1  M1  B1  2017    2   30 --dup1
R1  C1  M1  B1  2017    2   30 --dup1
R1  C1  M1  B2  2016    2   15
R1  C1  M1  B3  2017    2   20
R1  C2  M1  B1  2017    1   5
R1  C2  M2  B4  2017    2   25
R1  C2  M2  B5  2017    2   30
R2  C3  M1  B1  2017    1   35
R2  C3  M2  B4  2017    2   40
R2  C3  M2  B5  2017    2   45

从此输出中选择不同的行即可满足您的需求。

dense_rank 技巧的工作原理

考虑这个数据:

Col1    Col2
B       1
B       1
B       3
B       5
B       7
B       9

dense_rank() 根据当前项之前的不同项的数量加 1 对数据进行排名。因此:

1->1, 3->2, 5->3, 7->4, 9->5.

以相反的顺序(使用 desc)这会产生相反的模式:

1->5, 3->4, 5->3, 7->2, 9->1:

将这些排名相加得到相同的值:

1+5 = 2+4 = 3+3 = 4+2 = 5+1 = 6

这里的措辞很有帮助,

(number of distinct items before + 1) + (number of distinct items after + 1) 
= number of distinct OTHER items before AND after + 2 
= Total number of distinct items + 1

因此,要获得不同项目的总数,请将 ascendingdescending dense_rank 加在一起并减去 1。

dense_rank 想法意味着您需要两种排序(假设不存在提供排序顺序的索引)。假设没有 NULL 品牌(就像那个想法一样),您可以使用单个 dense_rank 和窗口 MAX,如下所示 (demo)

WITH T1
     AS (SELECT *,
                DENSE_RANK() OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period] ORDER BY Brand) AS [dr]
         FROM   myTable),
     T2
     AS (SELECT *,
                MAX([dr]) OVER (PARTITION BY [Region], [Country], [Manufacturer], [Period]) AS UniqBrandCount
         FROM   T1)
SELECT [Region],
       [Country],
       [Manufacturer],
       [Brand],
       Period,
       SUM([Spend])        AS [Spend],
       MAX(UniqBrandCount) AS UniqBrandCount
FROM   T2
GROUP  BY [Region],
          [Country],
          [Manufacturer],
          [Brand],
          [Period]
ORDER  BY [Region],
          [Country],
          [Manufacturer],
          [Period],
          Brand 

上面有一些不可避免的假脱机(不可能以 100% 的流式处理方式做到这一点)但是单一排序。

奇怪的是,需要最终的 order by 子句才能将排序数保持为 1(如果存在合适的索引,则为 0)。