根据 SQL 组内的百分比分配类别
Assigning categories based on percentage within a group in SQL
假设我有一个像这样的 table:
CampaignId Category Strike
1 A 2
1 B 3
1 Others 5
2 A 4
2 B 2
3 C 1
3 C 4
4 A 1
4 B 1
4 C 1
4 D 1
4 Others 1
然后,我将计算每个 Category
的 Strike
百分比 CampaignId
,如下所示:
SELECT CampaignId, Category, Strike, (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable
结果中间table如下:
CampaignId Category Strike PercentageOfStrikesByCategoryByCampaignId
1 A 2 20.0
1 B 3 30.0
1 Others 5 50.0
2 A 4 66.6
2 B 2 33.3
3 C 1 20.0
3 C 4 80.0
4 A 1 20.0
4 B 1 20.0
4 C 1 20.0
4 D 1 20.0
4 Others 1 20.0
现在,我想根据上面计算的 PercentageOfStrikesByCategoryByCampaignId
分配一个最终标签,比如 FinalCategory
。 FinalCategory
标准的要点是:如果每个 CampaignId
中的类别之一是 'Others' AND
是 PercentageOfStrikesByCategoryByCampaignId >= 30.0
,则其余行CampaignId
组将被标记为 'Others'。否则,我们将 Category
直接复制到 FinalCategory
。结果 table 应如下所示:
CampaignId Category Strike PercentageOfStrikesByCategoryByCampaignId FinalCategory
1 A 2 20.0 Others
1 B 3 30.0 Others
1 Others 5 50.0 Others
2 A 4 66.6 A
2 B 2 33.3 B
3 C 1 20.0 C
3 C 4 80.0 C
4 A 1 20.0 A
4 B 1 20.0 B
4 C 1 20.0 C
4 D 1 20.0 D
4 Others 1 20.0 Others
如何使用尽可能简单的 SQL 查询来实现这样的事情?预先感谢您的帮助!
SELECT CampaignId, Category, Strike, PercentageOfStrikesByCategoryByCampaignId,
CASE WHEN Others_count > 0 AND
MAX(CASE WHEN Category='Others' THEN PercentageOfStrikesByCategoryByCampaignId END) OVER (PARTITION BY CampaignId) >= 30 THEN 'Others'
ELSE Category END AS FinalCategory
FROM (
SELECT CampaignId, Category, Strike,
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId)
/ SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
,SUM(CASE WHEN Category='Others' THEN 1 ELSE 0 END) OVER (PARTITION BY CampaignId) as Others_count
FROM myTable
) T
添加到现有查询的是
- Others_Count 每个 campaignId 具有
sum
window 函数
- 使用
case
表达式计算 Others_Count 和 max
window 函数来检查 Others
类别的行是否有百分比 >= 30 和将 'Others' 指定为最终类别,否则按原样使用该类别。
让我们从您的查询作为 CTE 或子查询开始:
WITH t as (
SELECT CampaignId, Category, Strike,
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable
)
select t.*,
(case when OthersFlag = 1 then 'Others' else category end) as FinalCategory
from (select t.*,
sum(case when category = 'Others' and PercentageOfStrikesByCategoryByCampaignId > 30.0 then 1 else 0 end) over
(partition by campaignid) as OthersFlag
from t
) t;
假设我有一个像这样的 table:
CampaignId Category Strike
1 A 2
1 B 3
1 Others 5
2 A 4
2 B 2
3 C 1
3 C 4
4 A 1
4 B 1
4 C 1
4 D 1
4 Others 1
然后,我将计算每个 Category
的 Strike
百分比 CampaignId
,如下所示:
SELECT CampaignId, Category, Strike, (SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable
结果中间table如下:
CampaignId Category Strike PercentageOfStrikesByCategoryByCampaignId
1 A 2 20.0
1 B 3 30.0
1 Others 5 50.0
2 A 4 66.6
2 B 2 33.3
3 C 1 20.0
3 C 4 80.0
4 A 1 20.0
4 B 1 20.0
4 C 1 20.0
4 D 1 20.0
4 Others 1 20.0
现在,我想根据上面计算的 PercentageOfStrikesByCategoryByCampaignId
分配一个最终标签,比如 FinalCategory
。 FinalCategory
标准的要点是:如果每个 CampaignId
中的类别之一是 'Others' AND
是 PercentageOfStrikesByCategoryByCampaignId >= 30.0
,则其余行CampaignId
组将被标记为 'Others'。否则,我们将 Category
直接复制到 FinalCategory
。结果 table 应如下所示:
CampaignId Category Strike PercentageOfStrikesByCategoryByCampaignId FinalCategory
1 A 2 20.0 Others
1 B 3 30.0 Others
1 Others 5 50.0 Others
2 A 4 66.6 A
2 B 2 33.3 B
3 C 1 20.0 C
3 C 4 80.0 C
4 A 1 20.0 A
4 B 1 20.0 B
4 C 1 20.0 C
4 D 1 20.0 D
4 Others 1 20.0 Others
如何使用尽可能简单的 SQL 查询来实现这样的事情?预先感谢您的帮助!
SELECT CampaignId, Category, Strike, PercentageOfStrikesByCategoryByCampaignId,
CASE WHEN Others_count > 0 AND
MAX(CASE WHEN Category='Others' THEN PercentageOfStrikesByCategoryByCampaignId END) OVER (PARTITION BY CampaignId) >= 30 THEN 'Others'
ELSE Category END AS FinalCategory
FROM (
SELECT CampaignId, Category, Strike,
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId)
/ SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
,SUM(CASE WHEN Category='Others' THEN 1 ELSE 0 END) OVER (PARTITION BY CampaignId) as Others_count
FROM myTable
) T
添加到现有查询的是
- Others_Count 每个 campaignId 具有
sum
window 函数 - 使用
case
表达式计算 Others_Count 和max
window 函数来检查Others
类别的行是否有百分比 >= 30 和将 'Others' 指定为最终类别,否则按原样使用该类别。
让我们从您的查询作为 CTE 或子查询开始:
WITH t as (
SELECT CampaignId, Category, Strike,
(SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId) / SUM(Strike::FLOAT) OVER (PARTITION BY CampaignId, Category) * 100) AS PercentageOfStrikesByCategoryByCampaignId
FROM myTable
)
select t.*,
(case when OthersFlag = 1 then 'Others' else category end) as FinalCategory
from (select t.*,
sum(case when category = 'Others' and PercentageOfStrikesByCategoryByCampaignId > 30.0 then 1 else 0 end) over
(partition by campaignid) as OthersFlag
from t
) t;