如何使用聚合函数的计算作为过滤条件
How to using calculation of an aggregate function as filter criteria
我有预订订单的表格
Bookings (booking_id, booking_time, driver_id, customer_id)
Drivers (driver_id, name)
我需要确定在过去 30 天内由同一 driver 完成至少一半预订的所有客户。
说明
客户 x 有 12 个预订,其中 7 个由 driver_01 完成。
客户 y 有 10 个预订,其中 4 个由 driver_02 完成。
客户 z 进行了 3 次预订,其中 3 次由 driver_03
完成
输出将 return driver_01 和 driver_03 与 booking_id
我试过使用自连接和计数聚合,但我仍然不明白其中的逻辑
您正确地确定联接对于获取每位客户的预订总数很有用,因此
DROP TABLE IF EXISTS BOOKINGS,drivers;
create table Bookings (booking_id int, driver_id int, customer_id varchar(3));
create table Drivers (driver_id int, name varchar(3));
insert into bookings values
(1,1,1),(2,1,1),(3,2,1),(4,2,1),(5,3,1),
(6,1,2),(7,2,1);
insert into drivers values
(1,'aaa'),(2,'bbb');
select b.driver_id,d.name,b.customer_id,count(*) bcount,scount, count(*) / scount * 100 percent
from bookings b
join (select customer_id,count(*) scount from bookings group by customer_id) s
on s.customer_id = b.customer_id
join drivers d on d.driver_id = b.driver_id
group by driver_id,d.name,customer_id having count(*) / scount * 100 >= 50;
+-----------+------+-------------+--------+--------+----------+
| driver_id | name | customer_id | bcount | scount | percent |
+-----------+------+-------------+--------+--------+----------+
| 1 | aaa | 2 | 1 | 1 | 100.0000 |
| 2 | bbb | 1 | 3 | 6 | 50.0000 |
+-----------+------+-------------+--------+--------+----------+
2 rows in set (0.002 sec)
测试 50% 比测试 60% 更容易 - 不要忘记根据您的要求进行更改。
您可以使用 window 函数执行此操作:
select b.*, d.name as driver_name
from driver d
inner join (
select b.*,
count(*) over(partition by driver_id, customer_id) / count(*) over(partition by customer_id) as driver_ratio
from booking b
) b on b.driver_id = d.driver_id
where driver_ratio >= 0.6
我已经测试了@P.Salmon 的代码,因为我也有类似的方法,我发现@P.Salmon 的答案是正确的,但是如果你传入日期,比如说你只想 return 最近 30 天,就像你在问题上指定的那样,它可能行不通
见下文
SELECT b.booking_date,b.driver_id, d.name, b.customer_id, COUNT(*) b_count, c_count, COUNT(*) / c_count * 100 percent
FROM bookings b
JOIN (SELECT customer_id, COUNT(*) c_count from bookings GROUP BY customer_id) c ON c.customer_id = b.customer_id
JOIN drivers d ON d.driver_id = b.driver_id
WHERE b.booking_date BETWEEN NOW() - INTERVAL 30 DAY AND NOW()
GROUP BY booking_date, driver_id, d.name, customer_id
HAVING COUNT(*) / c_count * 100 >= 50;
我有预订订单的表格
Bookings (booking_id, booking_time, driver_id, customer_id)
Drivers (driver_id, name)
我需要确定在过去 30 天内由同一 driver 完成至少一半预订的所有客户。
说明
客户 x 有 12 个预订,其中 7 个由 driver_01 完成。
客户 y 有 10 个预订,其中 4 个由 driver_02 完成。
客户 z 进行了 3 次预订,其中 3 次由 driver_03
完成
输出将 return driver_01 和 driver_03 与 booking_id
我试过使用自连接和计数聚合,但我仍然不明白其中的逻辑
您正确地确定联接对于获取每位客户的预订总数很有用,因此
DROP TABLE IF EXISTS BOOKINGS,drivers;
create table Bookings (booking_id int, driver_id int, customer_id varchar(3));
create table Drivers (driver_id int, name varchar(3));
insert into bookings values
(1,1,1),(2,1,1),(3,2,1),(4,2,1),(5,3,1),
(6,1,2),(7,2,1);
insert into drivers values
(1,'aaa'),(2,'bbb');
select b.driver_id,d.name,b.customer_id,count(*) bcount,scount, count(*) / scount * 100 percent
from bookings b
join (select customer_id,count(*) scount from bookings group by customer_id) s
on s.customer_id = b.customer_id
join drivers d on d.driver_id = b.driver_id
group by driver_id,d.name,customer_id having count(*) / scount * 100 >= 50;
+-----------+------+-------------+--------+--------+----------+
| driver_id | name | customer_id | bcount | scount | percent |
+-----------+------+-------------+--------+--------+----------+
| 1 | aaa | 2 | 1 | 1 | 100.0000 |
| 2 | bbb | 1 | 3 | 6 | 50.0000 |
+-----------+------+-------------+--------+--------+----------+
2 rows in set (0.002 sec)
测试 50% 比测试 60% 更容易 - 不要忘记根据您的要求进行更改。
您可以使用 window 函数执行此操作:
select b.*, d.name as driver_name
from driver d
inner join (
select b.*,
count(*) over(partition by driver_id, customer_id) / count(*) over(partition by customer_id) as driver_ratio
from booking b
) b on b.driver_id = d.driver_id
where driver_ratio >= 0.6
我已经测试了@P.Salmon 的代码,因为我也有类似的方法,我发现@P.Salmon 的答案是正确的,但是如果你传入日期,比如说你只想 return 最近 30 天,就像你在问题上指定的那样,它可能行不通
见下文
SELECT b.booking_date,b.driver_id, d.name, b.customer_id, COUNT(*) b_count, c_count, COUNT(*) / c_count * 100 percent
FROM bookings b
JOIN (SELECT customer_id, COUNT(*) c_count from bookings GROUP BY customer_id) c ON c.customer_id = b.customer_id
JOIN drivers d ON d.driver_id = b.driver_id
WHERE b.booking_date BETWEEN NOW() - INTERVAL 30 DAY AND NOW()
GROUP BY booking_date, driver_id, d.name, customer_id
HAVING COUNT(*) / c_count * 100 >= 50;