如何使用聚合函数的计算作为过滤条件

How to using calculation of an aggregate function as filter criteria

我有预订订单的表格

Bookings (booking_id, booking_time, driver_id, customer_id)
Drivers (driver_id, name)

我需要确定在过去 30 天内由同一 driver 完成至少一半预订的所有客户。

说明

输出将 return driver_01 和 driver_03 与 booking_id

我试过使用自连接和计数聚合,但我仍然不明白其中的逻辑

您正确地确定联接对于获取每位客户的预订总数很有用,因此

DROP TABLE IF EXISTS BOOKINGS,drivers;

create table Bookings (booking_id int, driver_id int, customer_id varchar(3));

create table Drivers (driver_id int, name varchar(3));

insert into bookings values
(1,1,1),(2,1,1),(3,2,1),(4,2,1),(5,3,1),
(6,1,2),(7,2,1);

insert into drivers values
(1,'aaa'),(2,'bbb');

select b.driver_id,d.name,b.customer_id,count(*) bcount,scount, count(*) / scount * 100 percent
from bookings b
join (select customer_id,count(*) scount from bookings group by customer_id) s
        on s.customer_id = b.customer_id
join  drivers d on d.driver_id = b.driver_id
group by driver_id,d.name,customer_id having count(*) / scount * 100 >= 50;

+-----------+------+-------------+--------+--------+----------+
| driver_id | name | customer_id | bcount | scount | percent  |
+-----------+------+-------------+--------+--------+----------+
|         1 | aaa  | 2           |      1 |      1 | 100.0000 |
|         2 | bbb  | 1           |      3 |      6 |  50.0000 |
+-----------+------+-------------+--------+--------+----------+
2 rows in set (0.002 sec)

测试 50% 比测试 60% 更容易 - 不要忘记根据您的要求进行更改。

您可以使用 window 函数执行此操作:

select b.*, d.name as driver_name
from driver d
inner join (
    select b.*, 
        count(*) over(partition by driver_id, customer_id) / count(*) over(partition by customer_id) as driver_ratio
    from booking b
) b on b.driver_id = d.driver_id
where driver_ratio >= 0.6

我已经测试了@P.Salmon 的代码,因为我也有类似的方法,我发现@P.Salmon 的答案是正确的,但是如果你传入日期,比如说你只想 return 最近 30 天,就像你在问题上指定的那样,它可能行不通

见下文

SELECT b.booking_date,b.driver_id, d.name, b.customer_id, COUNT(*) b_count, c_count, COUNT(*) / c_count * 100 percent
FROM bookings b 
JOIN (SELECT customer_id, COUNT(*) c_count from bookings GROUP BY customer_id) c ON c.customer_id = b.customer_id 
JOIN drivers d ON d.driver_id = b.driver_id 
WHERE b.booking_date BETWEEN NOW() - INTERVAL 30 DAY AND NOW()
GROUP BY booking_date, driver_id, d.name, customer_id 
HAVING COUNT(*) / c_count * 100 >= 50;