使用 LEFT JOIN 和 SELF JOIN 以及聚合函数计算 acceptance_ratio
Calculating acceptance_ratio with LEFT JOIN and SELF JOIN and aggregate function
正在尝试根据 'connecting' table 计算每日接受率,其中有 4 个字段带有示例值:
date action sender_id recipient_id
'2017-01-05', 'request_link', 'frank', 'joe'
'2017-01-06', 'request_link', 'sally', 'ann'
'2017-01-07', 'request_link', 'bill', 'ted'
'2017-01-07', 'accept_link', 'joe', 'frank'
'2017-01-06', 'accept_link', 'ann', 'sally'
'2017-01-06', 'accept_link', 'ted', 'bill'
因为01-05有0个accept,1个request,所以它每天的accept ratio应该是0/1 = 0。同理,01-06的ratio应该是2/1,应该是1/ 1 表示 01-07。
然而重要的是每个 accept_link 都有对应的 request_link 其中 request_link 的 sender_id = [=27 的 recipient_id =](反之亦然)。因此,我认为这里需要自加入,以确保 Joe 接受 Frank 的请求,无论日期如何。
如何更正以下查询,以便聚合正常工作,同时保留所需的连接条件?如果删除两个 WHERE 条件,查询是否会按原样正确计算,或者它们是否必要?
SELECT f1.date,
SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC
预期输出应如下所示:
date acceptance_ratio
'2017-01-05' 0.0000
'2017-01-06' 2.0000
'2017-01-07' 1.0000
提前致谢。
再说一次,我认为您不需要在这里使用自联接。相反,只需对整个 table 使用条件聚合,并计算每天发生的请求和接受的数量:
SELECT t.date,
CASE WHEN t.num_requests = 0
THEN 'No requests available'
ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
END AS acceptance_ratio
FROM
(
SELECT c1.date,
SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
THEN 1 ELSE 0 END) AS num_accepts,
SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
FROM connecting c1
LEFT JOIN connecting c2
ON c1.action = 'accept_link' AND
c2.action = 'request_link' AND
c1.sender_id = c2.recipient_id AND
c2.recipient_id = c1.sender_id
GROUP BY c1.date
) t
ORDER BY t.date
请注意,我使用 CASE
表达式来处理被零除,这可能会在某天没有请求时发生。我这里也假设同一个邀请不会发出多次。
正在尝试根据 'connecting' table 计算每日接受率,其中有 4 个字段带有示例值:
date action sender_id recipient_id
'2017-01-05', 'request_link', 'frank', 'joe'
'2017-01-06', 'request_link', 'sally', 'ann'
'2017-01-07', 'request_link', 'bill', 'ted'
'2017-01-07', 'accept_link', 'joe', 'frank'
'2017-01-06', 'accept_link', 'ann', 'sally'
'2017-01-06', 'accept_link', 'ted', 'bill'
因为01-05有0个accept,1个request,所以它每天的accept ratio应该是0/1 = 0。同理,01-06的ratio应该是2/1,应该是1/ 1 表示 01-07。
然而重要的是每个 accept_link 都有对应的 request_link 其中 request_link 的 sender_id = [=27 的 recipient_id =](反之亦然)。因此,我认为这里需要自加入,以确保 Joe 接受 Frank 的请求,无论日期如何。
如何更正以下查询,以便聚合正常工作,同时保留所需的连接条件?如果删除两个 WHERE 条件,查询是否会按原样正确计算,或者它们是否必要?
SELECT f1.date,
SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC
预期输出应如下所示:
date acceptance_ratio
'2017-01-05' 0.0000
'2017-01-06' 2.0000
'2017-01-07' 1.0000
提前致谢。
再说一次,我认为您不需要在这里使用自联接。相反,只需对整个 table 使用条件聚合,并计算每天发生的请求和接受的数量:
SELECT t.date,
CASE WHEN t.num_requests = 0
THEN 'No requests available'
ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
END AS acceptance_ratio
FROM
(
SELECT c1.date,
SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
THEN 1 ELSE 0 END) AS num_accepts,
SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
FROM connecting c1
LEFT JOIN connecting c2
ON c1.action = 'accept_link' AND
c2.action = 'request_link' AND
c1.sender_id = c2.recipient_id AND
c2.recipient_id = c1.sender_id
GROUP BY c1.date
) t
ORDER BY t.date
请注意,我使用 CASE
表达式来处理被零除,这可能会在某天没有请求时发生。我这里也假设同一个邀请不会发出多次。