如何在表达式中多次使用 COUNT() 实现 OVER?
How do I implement OVER using COUNT() multiple times in an expression?
我对为解决 LeetCode 中的问题而编写的查询有疑问。问题是:
广告
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| ad_id | int |
| user_id | int |
| action | enum |
+---------------+---------+
(ad_id, user_id) is the primary key for this table.
Each row of this table contains the ID of an Ad, the ID of a user and
the action taken by this user regarding this Ad. The action column is
an ENUM type of ('Clicked', 'Viewed', 'Ignored').
A company is running Ads and wants to calculate the performance of
each Ad.
Performance of the Ad is measured using Click-Through Rate (CTR)
where:
CTR = { 0 if no ad clicks, Ad clicks / (Ad clicks + Ad views) otherwise
Write an SQL query to find the ctr of each Ad.
Round ctr to 2 decimal points. Order the result table by ctr in
descending order and by ad_id in ascending order in case of a tie.
The query result format is in the following example:
Ads table:
+-------+---------+---------+
| ad_id | user_id | action |
+-------+---------+---------+
| 1 | 1 | Clicked |
| 2 | 2 | Clicked |
| 3 | 3 | Viewed |
| 5 | 5 | Ignored |
| 1 | 7 | Ignored |
| 2 | 7 | Viewed |
| 3 | 5 | Clicked |
| 1 | 4 | Viewed |
| 2 | 11 | Viewed |
| 1 | 2 | Clicked |
+-------+---------+---------+
这是一个 fiddle 示例数据和我尝试的解决方案。尝试的解决方案转载如下:
SELECT DISTINCT t.ad_id, ROUND(
IF(
COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) = 0,
0,
COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) * 100 / ( COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) + COUNT(v.ad_id) OVER (PARTITION BY t.ad_id) )
), 2) as ctr
FROM Ads as t
LEFT JOIN Ads as c ON c.ad_id=t.ad_id AND c.user_id=t.user_id AND c.action='Clicked'
LEFT JOIN Ads as v ON v.ad_id=t.ad_id AND v.user_id=t.user_id AND v.action='Viewed'
GROUP BY t.ad_id, c.ad_id, v.ad_id
ORDER BY ctr DESC, t.ad_id
此查询的结果:
ad_id ctr
1 50.00
2 50.00
3 50.00
5 0.00
正确的结果应该是:
ad_id ctr
1, 66.67
3, 50.00
2, 33.33
5, 0.00
通过查看示例数据,我的猜测是 COUNT() 实际上并未像我预期的那样按 t.ad_id 进行分区。 50% 的点击率结果可以解释为我的点击率计算计算了所有 'Clicked' 和所有 'Viewed' 实例。 (另一方面,删除 CTR 计算中的 OVER 语句 - 只是计算,而不是条件 - 不会产生上面的结果,正如我的假设所暗示的那样。所以我不确定这一点。)
我使用OVER的方式有问题吗?我的逻辑在这里有缺陷吗?
另外,我还有一个额外的问题:我选择在这里使用 JOIN 是因为我假设 JOIN 比使用子查询更快。这是一个公平的假设吗?我正在为数据分析师 1 面试而学习 - 你认为面试官会关心我是否使用 JOIN 与子查询吗?
编辑:感谢 forpas 的解释,我能够想出一个比我原来的解决方案简单得多的解决方案。我认为 forpas 在他下面的回答中的解决方案可能仍然更可取,因为它明确处理 table.
中的 NULL。
SELECT ad_id, ROUND(IF(
SUM(action='Clicked') = 0,
0,
SUM(action='Clicked') * 100 / ( SUM(action='Clicked') + SUM(action='Viewed'))
), 2) as ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id
您可以使用条件聚合来完成:
SELECT ad_id,
ROUND(100 * COALESCE(SUM(action = 'Clicked') / SUM(action IN ('Clicked', 'Viewed')), 0), 2) ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id;
您可以使用 SUM()
window 函数获得相同的结果,但我不认为这对性能或可读性更好:
SELECT DISTINCT ad_id,
ROUND(
100 *
COALESCE(
SUM(action = 'Clicked') OVER (PARTITION BY ad_id) /
SUM(action IN ('Clicked', 'Viewed')) OVER (PARTITION BY ad_id)
, 0
)
, 2
) ctr
FROM Ads
ORDER BY ctr DESC, ad_id;
参见demo。
结果:
> ad_id | ctr
> ----: | ----:
> 1 | 66.67
> 3 | 50.00
> 2 | 33.33
> 5 | 0.00
SELECT t1.ad_id,
(CASE
WHEN t2.clicked/t1.total IS NULL THEN 0
ELSE round((t2.clicked/t1.total)*100,2) END) as ctr
FROM
(SELECT ad_id,SUM(CASE WHEN action IN ('Viewed','Clicked') THEN 1 ELSE 0 END) as total
FROM Ads
GROUP BY 1
)t1
LEFT JOIN
(SELECT ad_id,SUM(CASE WHEN action IN('Clicked') THEN 1 ELSE 0 END) as clicked
FROM Ads
GROUP BY 1)t2
ON t1.ad_id = t2.ad_id
ORDER BY 2 DESC, ad_id;
我对为解决 LeetCode 中的问题而编写的查询有疑问。问题是:
广告
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| ad_id | int |
| user_id | int |
| action | enum |
+---------------+---------+
(ad_id, user_id) is the primary key for this table.
Each row of this table contains the ID of an Ad, the ID of a user and the action taken by this user regarding this Ad. The action column is an ENUM type of ('Clicked', 'Viewed', 'Ignored').
A company is running Ads and wants to calculate the performance of each Ad.
Performance of the Ad is measured using Click-Through Rate (CTR) where:
CTR = { 0 if no ad clicks, Ad clicks / (Ad clicks + Ad views) otherwise
Write an SQL query to find the ctr of each Ad.
Round ctr to 2 decimal points. Order the result table by ctr in descending order and by ad_id in ascending order in case of a tie.
The query result format is in the following example:
Ads table:
+-------+---------+---------+
| ad_id | user_id | action |
+-------+---------+---------+
| 1 | 1 | Clicked |
| 2 | 2 | Clicked |
| 3 | 3 | Viewed |
| 5 | 5 | Ignored |
| 1 | 7 | Ignored |
| 2 | 7 | Viewed |
| 3 | 5 | Clicked |
| 1 | 4 | Viewed |
| 2 | 11 | Viewed |
| 1 | 2 | Clicked |
+-------+---------+---------+
这是一个 fiddle 示例数据和我尝试的解决方案。尝试的解决方案转载如下:
SELECT DISTINCT t.ad_id, ROUND(
IF(
COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) = 0,
0,
COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) * 100 / ( COUNT(c.ad_id) OVER (PARTITION BY t.ad_id) + COUNT(v.ad_id) OVER (PARTITION BY t.ad_id) )
), 2) as ctr
FROM Ads as t
LEFT JOIN Ads as c ON c.ad_id=t.ad_id AND c.user_id=t.user_id AND c.action='Clicked'
LEFT JOIN Ads as v ON v.ad_id=t.ad_id AND v.user_id=t.user_id AND v.action='Viewed'
GROUP BY t.ad_id, c.ad_id, v.ad_id
ORDER BY ctr DESC, t.ad_id
此查询的结果:
ad_id ctr
1 50.00
2 50.00
3 50.00
5 0.00
正确的结果应该是:
ad_id ctr
1, 66.67
3, 50.00
2, 33.33
5, 0.00
通过查看示例数据,我的猜测是 COUNT() 实际上并未像我预期的那样按 t.ad_id 进行分区。 50% 的点击率结果可以解释为我的点击率计算计算了所有 'Clicked' 和所有 'Viewed' 实例。 (另一方面,删除 CTR 计算中的 OVER 语句 - 只是计算,而不是条件 - 不会产生上面的结果,正如我的假设所暗示的那样。所以我不确定这一点。)
我使用OVER的方式有问题吗?我的逻辑在这里有缺陷吗?
另外,我还有一个额外的问题:我选择在这里使用 JOIN 是因为我假设 JOIN 比使用子查询更快。这是一个公平的假设吗?我正在为数据分析师 1 面试而学习 - 你认为面试官会关心我是否使用 JOIN 与子查询吗?
编辑:感谢 forpas 的解释,我能够想出一个比我原来的解决方案简单得多的解决方案。我认为 forpas 在他下面的回答中的解决方案可能仍然更可取,因为它明确处理 table.
中的 NULL。SELECT ad_id, ROUND(IF(
SUM(action='Clicked') = 0,
0,
SUM(action='Clicked') * 100 / ( SUM(action='Clicked') + SUM(action='Viewed'))
), 2) as ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id
您可以使用条件聚合来完成:
SELECT ad_id,
ROUND(100 * COALESCE(SUM(action = 'Clicked') / SUM(action IN ('Clicked', 'Viewed')), 0), 2) ctr
FROM Ads
GROUP BY ad_id
ORDER BY ctr DESC, ad_id;
您可以使用 SUM()
window 函数获得相同的结果,但我不认为这对性能或可读性更好:
SELECT DISTINCT ad_id,
ROUND(
100 *
COALESCE(
SUM(action = 'Clicked') OVER (PARTITION BY ad_id) /
SUM(action IN ('Clicked', 'Viewed')) OVER (PARTITION BY ad_id)
, 0
)
, 2
) ctr
FROM Ads
ORDER BY ctr DESC, ad_id;
参见demo。
结果:
> ad_id | ctr
> ----: | ----:
> 1 | 66.67
> 3 | 50.00
> 2 | 33.33
> 5 | 0.00
SELECT t1.ad_id,
(CASE
WHEN t2.clicked/t1.total IS NULL THEN 0
ELSE round((t2.clicked/t1.total)*100,2) END) as ctr
FROM
(SELECT ad_id,SUM(CASE WHEN action IN ('Viewed','Clicked') THEN 1 ELSE 0 END) as total
FROM Ads
GROUP BY 1
)t1
LEFT JOIN
(SELECT ad_id,SUM(CASE WHEN action IN('Clicked') THEN 1 ELSE 0 END) as clicked
FROM Ads
GROUP BY 1)t2
ON t1.ad_id = t2.ad_id
ORDER BY 2 DESC, ad_id;