在多对多连接 table 中,如何计算两个 "owners" 共享的条目数?

In a many-to-many join table, how can I count the number of entries shared by two "owners"?

我有一个电影列表和一个比喻列表。为了计算两部电影之间的相似度,我使用 cosine differences。如果所有权重都是偶数,那么它就很好地简化了:

similarity =

(number of shared tropes between both movies)
/
(SQRT(number of tropes from movie 1) + SQRT(number of tropes from movie 2))

例如,如果电影 1 有比喻 1、3 和 4,而电影 2 有比喻 1、4、6 和 7,那么它们之间将共享两个比喻,相似度为

2 / (SQRT(3) + SQRT(4)) = 2 / 3.73... = 0.54

我的MySQLtable非常标准:

movies:
- id
- ...

tropes:
- id
- ...

movie_tropes:
- movie_id
- trope_id

我可以轻松数出一部电影的比喻数:

SELECT count(distinct trope_id) from movie_tropes where movie_id = 1;
SELECT count(distinct trope_id) from movie_tropes where movie_id = 2;

我对 SQL 有点不熟悉。是否有一种简单的连接方式来计算此连接 table 中电影 1 和电影 2 发生的 trope_ids 的数量?

Is there a simple way to count the number of trope_ids that occur for both movie 1 and movie 2?

您可以自行加入:

select count(distinct trope_id)
from movie_tropes t1
inner join movie_tropes t2 on t2.trope_id = t1.trope_id
where t1.movie_id = 1 and t2.movie_id = 2

但总的来说,您可以通过两个聚合级别同时计算三个基数。我会推荐:

select 
    sum(has_1) as cnt_1,            -- count of distinct tropes for movie 1
    sum(has_2) as cnt_2,            -- count of distinct tropes for movie 2
    sum(has_1 and has_2) as cnt_both  -- count of distinct tropes for both movies
from (
    select max(movie_id = 1) has_1, max(movie_id = 2) as has_2
    from movie_tropes t
    where movie_id in (1, 2)
    group by trope_id
) t