根据 SQL 组内的百分比分配类别

Assigning categories based on percentage within a group in SQL

假设我有一个像这样的 table:

CampaignId    Category    Strike
    1            A          2
    1            B          3
    1          Others       5
    2            A          4
    2            B          2
    3            C          1
    3            C          4
    4            A          1
    4            B          1
    4            C          1
    4            D          1
    4          Others       1

然后,我将计算每个 CategoryStrike 百分比 CampaignId,如下所示:

SELECT CampaignId, Category, Strike, (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable

结果中间table如下:

CampaignId    Category    Strike    PercentageOfStrikesByCategoryByCampaignId
    1            A          2        20.0
    1            B          3        30.0
    1          Others       5        50.0
    2            A          4        66.6
    2            B          2        33.3
    3            C          1        20.0
    3            C          4        80.0
    4            A          1        20.0
    4            B          1        20.0
    4            C          1        20.0
    4            D          1        20.0
    4         Others        1        20.0

现在,我想根据上面计算的 PercentageOfStrikesByCategoryByCampaignId 分配一个最终标签,比如 FinalCategoryFinalCategory 标准的要点是:如果每个 CampaignId 中的类别之一是 'Others' ANDPercentageOfStrikesByCategoryByCampaignId >= 30.0,则其余行CampaignId 组将被标记为 'Others'。否则,我们将 Category 直接复制到 FinalCategory。结果 table 应如下所示:

CampaignId    Category    Strike    PercentageOfStrikesByCategoryByCampaignId    FinalCategory
    1            A          2        20.0                                        Others 
    1            B          3        30.0                                        Others
    1          Others       5        50.0                                        Others
    2            A          4        66.6                                        A
    2            B          2        33.3                                        B
    3            C          1        20.0                                        C
    3            C          4        80.0                                        C
    4            A          1        20.0                                        A
    4            B          1        20.0                                        B
    4            C          1        20.0                                        C
    4            D          1        20.0                                        D
    4         Others        1        20.0                                        Others

如何使用尽可能简单的 SQL 查询来实现这样的事情?预先感谢您的帮助!

SELECT CampaignId, Category, Strike, PercentageOfStrikesByCategoryByCampaignId,
CASE WHEN Others_count > 0 AND 
     MAX(CASE WHEN Category='Others' THEN PercentageOfStrikesByCategoryByCampaignId END) OVER (PARTITION BY CampaignId) >= 30 THEN 'Others'
ELSE Category END AS FinalCategory
FROM (
SELECT CampaignId, Category, Strike, 
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) 
 / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
,SUM(CASE WHEN Category='Others' THEN 1 ELSE 0 END) OVER (PARTITION BY CampaignId) as Others_count
FROM myTable
) T

添加到现有查询的是

  • Others_Count 每个 campaignId 具有 sum window 函数
  • 使用 case 表达式计算 Others_Count 和 max window 函数来检查 Others 类别的行是否有百分比 >= 30 和将 'Others' 指定为最终类别,否则按原样使用该类别。

让我们从您的查询作为 CTE 或子查询开始:

WITH t as (
      SELECT CampaignId, Category, Strike, 
             (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
      FROM myTable
     )
select t.*,
       (case when OthersFlag = 1 then 'Others' else category end) as FinalCategory
from (select t.*,
             sum(case when category = 'Others' and PercentageOfStrikesByCategoryByCampaignId > 30.0 then 1 else 0 end) over
                 (partition by campaignid) as OthersFlag
      from t
     ) t;