使用 LEFT JOIN 和 SELF JOIN 以及聚合函数计算 acceptance_ratio

Calculating acceptance_ratio with LEFT JOIN and SELF JOIN and aggregate function

正在尝试根据 'connecting' table 计算每日接受率,其中有 4 个字段带有示例值:

date          action         sender_id        recipient_id 
'2017-01-05', 'request_link', 'frank', 'joe' 
'2017-01-06', 'request_link', 'sally', 'ann' 
'2017-01-07', 'request_link', 'bill', 'ted' 
'2017-01-07', 'accept_link', 'joe', 'frank' 
'2017-01-06', 'accept_link', 'ann', 'sally' 
'2017-01-06', 'accept_link', 'ted', 'bill' 

因为01-05有0个accept,1个request,所以它每天的accept ratio应该是0/1 = 0。同理,01-06的ratio应该是2/1,应该是1/ 1 表示 01-07。

然而重要的是每个 accept_link 都有对应的 request_link 其中 request_link 的 sender_id = [=27 的 recipient_id =](反之亦然)。因此,我认为这里需要自加入,以确保 Joe 接受 Frank 的请求,无论日期如何。

如何更正以下查询,以便聚合正常工作,同时保留所需的连接条件?如果删除两个 WHERE 条件,查询是否会按原样正确计算,或者它们是否必要?

SELECT f1.date, 
    SUM(CASE WHEN f2.action = 'accept_link' THEN 1 ELSE 0 END) /
    SUM(CASE WHEN f2.action = 'request_link' THEN 1 ELSE 0 END) AS acceptance_ratio
FROM connecting f1
LEFT JOIN connecting f2
ON f1.sender_id = f2.recipient_id
LEFT JOIN connecting f2
ON f1.recipient_id = f2.sender_id
WHERE f1.action = 'request_link'
AND f2.action = 'accept_link'
GROUP BY f1.date
ORDER BY f1.date ASC

预期输出应如下所示:

date          acceptance_ratio
'2017-01-05'  0.0000
'2017-01-06'  2.0000
'2017-01-07'  1.0000

提前致谢。

再说一次,我认为您不需要在这里使用自联接。相反,只需对整个 table 使用条件聚合,并计算每天发生的请求和接受的数量:

SELECT t.date,
       CASE WHEN t.num_requests = 0
            THEN 'No requests available'
            ELSE CAST(t.num_accepts / t.num_requests AS CHAR(50))
       END AS acceptance_ratio
FROM
(
    SELECT c1.date,
           SUM(CASE WHEN c1.action = 'accept_link' AND c2.action IS NOT NULL
                    THEN 1 ELSE 0 END) AS num_accepts,
           SUM(CASE WHEN c1.action = 'request_link' THEN 1 ELSE 0 END) AS num_requests
    FROM connecting c1
    LEFT JOIN connecting c2
        ON c1.action       = 'accept_link'   AND
           c2.action       = 'request_link'  AND
           c1.sender_id    = c2.recipient_id AND
           c2.recipient_id = c1.sender_id
    GROUP BY c1.date
) t
ORDER BY t.date

请注意,我使用 CASE 表达式来处理被零除,这可能会在某天没有请求时发生。我这里也假设同一个邀请不会发出多次。